0% found this document useful (0 votes)
45 views

Ai Course File

Uploaded by

shankarscse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Ai Course File

Uploaded by

shankarscse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 215

VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN

Approved by AICTE, NewDelhi & Affiliated to Anna University, Chennai

Elayampalayam, Tiruchengode Tk – 637205, Namakkal Dt.

DEPARTMENT OF ______________________________________________
COURSE FILE INDEX – THEORY COURSE
Name of the Faculty
Designation and Department
Course Code and Name
Department to which the Course is offered
Academic Year / Semester / Section
Yes / Yes /
SL.No List/Detail SL.No List/Detail
No No
Vision, Mission of the Institute
1 9 Personal log book
& Department
Consolidated Internal Mark
2 Authenticated Syllabus Copy 10
Statement
Student’s Nominal Roll
3 11 Course End Survey
(Approved Copy)

4 Individual Time Table 12 Feedback on Course Content

Course Coordinators Meeting Feedback on Subject Handling


5 13
Minutes (if applicable) Faculty

6 Lesson Plan - Theory 14 End Semester Result

Lecture Notes & Other


7 15 CO Attainment
Teaching Materials

8 Previous Year Question Papers 16 CO and PO Attainment Report.

Internal Assessment Test / End Semester Examination (tick the corresponding cell)
Slow Learners
Question Result Sample
Test No. Answer Key List & Remedial
Paper Analysis Answer Scripts
Measures
1
17
2
3

End Semester
Other Assessments (Quiz / Open Book Test / Assignment etc.,) (tick the corresponding cell)
Answer Key
Question Mark Details /
with Rubrics Sample
Type
Paper Statement Answer Scripts Remarks
18 (If any)

Course Incharge(s) HoD / ______


Institute Vision Statement

 To emerge as a premier institute of international repute by imparting


quality technical education with ethical values to produce
professionally competent and socially responsible women engineers.
Institute Mission (IM) Statements
IM-1: Offering value-based engineering education through innovative teaching
methodologies to bring out technically capable, ethically strong, and
quality professionals.

IM-2: Providing an outstanding infrastructure that creates a conducive


teaching and learning environment for faculty and students.

IM-3: Integrating academics with industry for enhancing innovation and


research in young minds, thereby empowering the students to meet
global standards.

IM-4: Establishing a women community, contributing solutions to social


challenges as principled engineers.

Department Vision Statement

 To impart high quality technical education with ethical values, evolve as


a center of proficiency in the field of Artificial Intelligence & Data
Science, promote collaborative research, lifelong learning and
entrepreneur; produce industry ready engineers to meet the global
challenges and societal needs.

Department Mission (IM) Statements


DM-1: Empowering women students with innovative, modern and cognitive

skills in the field of Artificial Intelligence and Data Science.

DM-2: Become a Centre of Excellence in the field of Artificial Intelligence and

Data Science and produce technocrats who can able to provide solutions to

inter-disciplinary applications to meet the social needs

DM-3: Enable integration of academics and industry, to bring innovative ideas

in the student’s mind, thus empowering them to meet the global challenges.

DM-4: Encourage the students for higher education, research and

entrepreneurship in Artificial Intelligence and Data Science.


AL3391 ARTIFICIAL INTELLIGENCE
LTPC
3003
COURSE OBJECTIVES:
The main objectives of this course are to:
• Learn the basic AI approaches
• Develop problem solving agents
• Perform logical and probabilistic reasoning

UNIT I INTELLIGENT AGENTS 9


Introduction to AI – Agents and Environments – concept of rationality – nature of
environments – structure of agents. Problem solving agents – search algorithms –
uninformed search strategies.

UNIT II PROBLEM SOLVING 9


Heuristic search strategies – heuristic functions. Local search and optimization problems
– local search in continuous space – search with non-deterministic actions – search in
partially observable environments – online search agents and unknown environments.

UNIT III GAME PLAYING AND CSP 9


Game theory – optimal decisions in games – alpha-beta search – monte-carlo tree search –
stochastic games – partially observable games. Constraint satisfaction problems –
constraint propagation – backtracking search for CSP – local search for CSP – structure of
CSP.

UNIT IV LOGICAL REASONING 9


Knowledge-based agents – propositional logic – propositional theorem proving –
propositional model checking – agents based on propositional logic. First-order logic –
syntax and semantics – knowledge representation and engineering – inferences in first-
order logic – forward chaining – backward chaining – resolution.

UNIT V PROBABILISTIC REASONING 9


Acting under uncertainty – Bayesian inference – naïve Bayes models. Probabilistic
reasoning – Bayesian networks – exact inference in BN – approximate inference in BN –
causal networks.

COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Explain intelligent agent frameworks
CO2: Apply problem solving techniques
CO3: Apply game playing and CSP techniques
CO4: Perform logical reasoning
CO5: Perform probabilistic reasoning under uncertainty
TOTAL:45 PERIODS
TEXT BOOKS:
1. Stuart Russell and Peter Norvig, “Artificial Intelligence – A Modern Approach”,
Fourth Edition, Pearson Education, 2021.

REFERENCES:
1. Dan W. Patterson, “Introduction to AI and ES”, Pearson Education,2007
2. Kevin Night, Elaine Rich, and Nair B., “Artificial Intelligence”, McGraw Hill,
2008
3. Patrick H. Winston, "Artificial Intelligence", Third Edition, Pearson Education,
2006
4. Deepak Khemani, “Artificial Intelligence”, Tata McGraw Hill Education, 2013.
5. http://nptel.ac.in/
VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN
Approved by AICTE, NewDelhi & Affiliated to Anna University, Chennai
Elayampalayam, Tiruchengode Tk – 637205, Namakkal Dt.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

STUDENT NAME LIST

Academic Year: 2024-2025 Year/Sem: II/III

Name Of The
S.No Reg. No Remarks
Student

1 613023243001 Aashika S

2 613023243002 Abinaya K

3 613023243003 Agalya V

4 613023243004 Akila K

5 613023243006 Anusree M

6 613023243007 Arunaethi M S

7 613023243008 Arunika E

8 613023243009 Chella

9 613023243010 Devasri K

10 613023243011 Dharshini G S

11 613023243012 Dhivyalakshmi V

12 613023243013 Elangani E

13 613023243014 Hemadharshini M

14 613023243015 Hemamalini L

15 613023243016 Hemavathi S

16 613023243017 Indhuja P

17 613023243018 Indhumathi E

18 613023243019 Janani S

19 613023243020 Jaya S

20 613023243021 Joshika S

21 613023243022 Kalpana J

22 613023243023 Kanishka J
23 613023243024 Kanmani V

24 613023243025 Kaviya C

25 613023243026 Kaviya K

26 613023243027 Keerthana P

27 613023243028 Kiruthika R

28 613023243029 Kiruthika S

29 613023243030 Kowshika S

30 613023243031 Lathasree S

31 613023243032 Madhumitha M

32 613023243033 Mahalakshmi S

33 613023243034 Nandhini R

34 613023243035 Nasiha M

35 613023243036 Navya K

36 613023243037 Nellore Mokshitha

37 613023243038 Nithya Sri R

38 613023243039 Pavithra B

39 613023243040 Punitham S

40 613023243041 Reemas M

41 613023243042 Rithicka M

42 613023243043 Sandhiya R

43 613023243044 Sandhiya S

44 613023243045 Selvamithra S

45 613023243046 Shree Varshini K

46 613023243047 Sophiya B

47 613023243048 Sowbarnika S J

48 613023243049 Srivarshini K

49 613023243050 Suganthi S

50 613023243051 Suganya V

51 613023243052 Sujitha M

52 613023243053 Swetha C

53 613023243054 Thamanna Hazin A


54 613023243055 Thasneen S

55 613023243056 Thejeswini S

56 613023243057 Thennarasi A

57 613023243058 Thiru Malini S G

58 613023243059 Varsha A

59 613023243060 Vethavalli J
VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN
Approved by AICTE, NewDelhi & Affiliated to Anna University, Chennai
Elayampalayam, Tiruchengode Tk – 637205, Namakkal Dt.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE VCTW


Faculty Timetable – Odd Semester 2024-25 AC01 06 Rev.0

Faculty Name : Mr.S.Shankar


Designation : Assistant Professor Semester : Odd
Academic Year : 2024 - 2025 w.e.f : 12.08.2024

Time 11:15 AM

01:40 PM

03:20 PM
11:05 AM

I II III IV V VI VII

12:45 PM

03:10 PM
09:30AM 10:20 AM 11:15 AM 12:00 PM 01:40 PM 02:25 PM 03:20 PM

To

To
to

to to to to to to to
Day 10:20AM 11:05 AM 12:00 PM 12:45 PM 02:25 PM 03:10 PM 04:00 PM

Monday AI

Tuesday AI

Lunch
Break

Break
Wednesday
Thursday AI

Friday AI
EnggTree.com

AL3391 ARTIFICIAL INTELLIGENCE

UNIT I

Introduction to AI:

What is Artificial Intelligence?

In today's world, technology is growing very fast, and we are getting in touch with different new
technologies day by day.

Here, one of the booming technologies of computer science is Artificial Intelligence which is
ready to create a new revolution in the world by making intelligent machines.The Artificial
Intelligence is now all around us. It is currently working with a variety of subfields, ranging from
general to specific, such as self-driving cars, playing chess, proving theorems, playing music,
Painting, etc.

AI is one of the fascinating and universal fields of Computer science which has a great scope in

Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial
defines "man-made," and intelligence defines "thinking power", hence AI means "a man-made
thinking power."

So, we can define AI as:

"It is a branch of computer science by which we can create intelligent machines which can
behave like a human, think like humans, and able to make decisions."

Artificial Intelligence exists when a machine can have human based skills such as learning,
reasoning, and solving problems

With Artificial Intelligence you do not need to preprogram a machine to do some work, despite
that you can create a machine with programmed algorithms which can work with own
intelligence, and that is the awesomeness of AI.

It is believed that AI is not a new technology, and some people says that as per Greek myth,
there were Mechanical men in early days which can work and behave like humans.

Why Artificial Intelligence?

Before Learning about Artificial Intelligence, we should know that what is the importance of AI
and why should we learn it. Following are some main reasons to learn about AI:

o With the help of AI, you can create such software or devices which can solve real-world
problems very easily and with accuracy such as health issues, marketing, traffic issues,
etc.

Downloaded from EnggTree.com


EnggTree.com

o With the help of AI, you can create your personal virtual Assistant, such as Cortana,
Google Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an environment where
survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new Opportunities.

Goals of Artificial Intelligence

Following are the main goals of Artificial Intelligence:

1. Replicate human intelligence


2. Solve Knowledge-intensive tasks
3. An intelligent connection of perception and action
4. Building a machine which can perform tasks that requires human intelligence such as:
o Proving a theorem
o Playing chess
o Plan some surgical operation
o Driving a car in traffic
5. Creating some system which can exhibit intelligent behavior, learn new things by itself,
demonstrate, explain, and can advise to its user.

What Comprises to Artificial Intelligence?

Artificial Intelligence is not just a part of computer science even it's so vast and requires lots of
other factors which can contribute to it. To create the AI first we should know that how
intelligence is composed, so the Intelligence is an intangible part of our brain which is a
combination of Reasoning, learning, problem-solving perception, language understanding,
etc.

To achieve the above factors for a machine or software Artificial Intelligence requires the
following discipline:

o Mathematics
o Biology
o Psychology
o Sociology
o Computer Science
o Neurons Study
o Statistics

Downloaded from EnggTree.com


EnggTree.com

Advantages of Artificial Intelligence

Following are some main advantages of Artificial Intelligence:

o High Accuracy with less errors: AI machines or systems are prone to less errors and
high accuracy as it takes decisions as per pre-experience or information.
o High-Speed: AI systems can be of very high-speed and fast-decision making, because of
that AI systems can beat a chess champion in the Chess game.
o High reliability: AI machines are highly reliable and can perform the same action
multiple times with high accuracy.
o Useful for risky areas: AI machines can be helpful in situations such as defusing a
bomb, exploring the ocean floor, where to employ a human can be risky.
o Digital Assistant: AI can be very useful to provide digital assistant to the users such as
AI technology is currently used by various E-commerce websites to show the products as
per customer requirement.
o Useful as a public utility: AI can be very useful for public utilities such as a self-driving
car which can make our journey safer and hassle-free, facial recognition for security
purpose, Natural language processing to communicate with the human in human-
language, etc.

Disadvantages of Artificial Intelligence

Every technology has some disadvantages, and thesame goes for Artificial intelligence. Being so
advantageous technology still, it has some disadvantages which we need to keep in our mind
while creating an AI system. Following are the disadvantages of AI:

o High Cost: The hardware and software requirement of AI is very costly as it requires lots
of maintenance to meet current world requirements.
o Can't think out of the box: Even we are making smarter machines with AI, but still they
cannot work out of the box, as the robot will only do that work for which they are trained,
or programmed.

Downloaded from EnggTree.com


EnggTree.com

o No feelings and emotions: AI machines can be an outstanding performer, but still it does
not have the feeling so it cannot make any kind of emotional attachment with human, and
may sometime be harmful for users if the proper care is not taken.
o Increase dependency on machines: With the increment of technology, people are
getting more dependent on devices and hence they are losing their mental capabilities.
o No Original Creativity: As humans are so creative and can imagine some new ideas but
still AI machines cannot beat this power of human intelligence and cannot be creative and
imaginative.

Prerequisite

Before learning about Artificial Intelligence, you must have the fundamental knowledge of
following so that you can understand the concepts easily:

o Any computer language such as C, C++, Java, Python, etc.(knowledge of Python will be
an advantage)
o Knowledge of essential Mathematics such as derivatives, probability theory, etc.

Future of Artificial Intelligence:

1. Health Care Industries


India is 17.7% of the worlds’ population that makes it the second-largest country in terms of
China’s population. Health care facilities are not available to all individuals living in the country.
It is because of the lack of good doctors, not having good infrastructure, etc. Still, there are
people who couldn’t reach to doctors/ hospitals. AI has the ability to provide the facility to detect
disease based on symptoms; even if you don’t go to the doctor, AI would read the data from
Fitness band/medical history of an individual to analyze the pattern and suggest proper
medication and even deliver it on one’s fingertips just through cell-phone.
As mentioned earlier Google’s deep mind has already beaten doctors in detecting fatal diseases
like breast cancer. It’s not far away when AI will be detecting common disease as well as
providing proper suggestions for medication. The consequences of this could be: no need for
doctors in the long term result in JOB reduction.
2. AI in Education
The development of a country depends on the quality of education youth is getting. Right now,
we can see there are lots of courses are available on AI. But in the future AI is going to transform
the classical way of education. Now the world doesn’t need skilled labourers for manufacturing
industries, which is mostly replaced by robots and automation. The education system could be
quite effective and can be according to the individual’s personality and ability. It would give
chance brighter students to shine and to imbecile a better way to cop up.

Downloaded from EnggTree.com


EnggTree.com

Right Education can enhance the power of individuals/nations; on the other hand, misuse of the
same could lead to devastating results.

3. AI in Finance
Quantification of growth for any country is directly related to its economic and financial
condition. As AI has enormous scope in almost every field, it has great potential to boost
individuals’ economic health and a nation. Nowadays, the AI algorithm is being used in
managing equity funds.

An AI system could take a lot number of parameters while figuring out the best way to manage
funds. It would perform better than a human manager. AI-driven strategies in the field of finance
are going to change the classical way of trading and investing. It could be devastating for some
fund managing firms who cannot afford such facilities and could affect business on a large scale,
as the decision would be quick and abrupt. The competition would be tough and on edge all the
time.

4. AI in Military and Cybersecurity


AI-assisted Military technologies have built autonomous weapon systems, which won’t need
humans at all hence building the safest way to enhance the security of a nation. We could see
robot Military in the near future, which is as intelligent as a soldier/ commando and will be able
to perform some tasks.

AI-assisted strategies would enhance mission effectiveness and will provide the safest way to
execute it. The concerning part with AI-assisted system is that how it performs algorithm is not
quite explainable. The deep neural networks learn faster and continuously keep learning the main
problem here would be explainable AI. It could possess devastating results when it reaches in the
wrong hands or makes wrong decisions on its own.

Intelligent Agents:

Types of AI Agents or (Structure of Agents):

Agents can be grouped into five classes based on their degree of perceived intelligence and
capability. All these agents can improve their performance and generate better action over the
time. These are given below:

Downloaded from EnggTree.com


EnggTree.com

o Simple Reflex Agent


o Model-based reflex agent
o Goal-based agents
o Utility-based agent
o Learning agent

1. Simple Reflex agent:

o The Simple reflex agents are the simplest agents. These agents take decisions on the basis
of the current percepts and ignore the rest of the percept history.
o These agents only succeed in the fully observable environment.
o The Simple reflex agent does not consider any part of percepts history during their
decision and action process.
o The Simple reflex agent works on Condition-action rule, which means it maps the current
state to action. Such as a Room Cleaner agent, it works only if there is dirt in the room.
o Problems for the simple reflex agent design approach:
o They have very limited intelligence
o They do not have knowledge of non-perceptual parts of the current state
o Mostly too big to generate and to store.
o Not adaptive to changes in the environment.

2. Model-based reflex agent

o The Model-based agent can work in a partially observable environment, and track the
situation.
o A model-based agent has two important factors:
o Model: It is knowledge about "how things happen in the world," so it is called a
Model-based agent.
o Internal State: It is a representation of the current state based on percept history.
o These agents have the model, "which is knowledge of the world" and based on the model
they perform actions.
o Updating the agent state requires information about:
a. How the world evolves

Downloaded from EnggTree.com


EnggTree.com

b. How the agent's action affects the world.

3. Goal-based agents

o The knowledge of the current state environment is not always sufficient to decide for an
agent to what to do.
o The agent needs to know its goal which describes desirable situations.
o Goal-based agents expand the capabilities of the model-based agent by having the "goal"
information.
o They choose an action, so that they can achieve the goal.
o These agents may have to consider a long sequence of possible actions before deciding
whether the goal is achieved or not. Such considerations of different scenario are called
searching and planning, which makes an agent proactive.

4. Utility-based agents

o These agents are similar to the goal-based agent but provide an extra component of utility
measurement which makes them different by providing a measure of success at a given
state.

Downloaded from EnggTree.com


EnggTree.com

o Utility-based agent act based not only goals but also the best way to achieve the goal.
o The Utility-based agent is useful when there are multiple possible alternatives, and an
agent has to choose in order to perform the best action.
o The utility function maps each state to a real number to check how efficiently each action
achieves the goals.

5. Learning Agents

o A learning agent in AI is the type of agent which can learn from its past experiences, or it
has learning capabilities.
o It starts to act with basic knowledge and then able to act and adapt automatically through
learning.
o A learning agent has mainly four conceptual components, which are:
a. Learning element: It is responsible for making improvements by learning from
environment
b. Critic: Learning element takes feedback from critic which describes that how
well the agent is doing with respect to a fixed performance standard.
c. Performance element: It is responsible for selecting external action
d. Problem generator: This component is responsible for suggesting actions that
will lead to new and informative experiences.
Hence, learning agents are able to learn, analyze performance, and look for new ways to improve
the performance.

Downloaded from EnggTree.com


EnggTree.com

Nature of Nature of
Nature of Environments:

The environment is the Task Environment (problem) for which the Rational Agent is the
solution. Any task environment is characterised on the basis of PEAS.

1. Performance – What is the performance characteristic which would either make the agent
successful or not. For example, as per the previous example clean floor, optimal energy
consumption might be performance measures.
2. Environment – Physical characteristics and constraints expected. For example, wood floors,
furniture in the way etc
3. Actuators – The physical or logical constructs which would take action. For example for the
vacuum cleaner, these are the suction pumps
4. Sensors – Again physical or logical constructs which would sense the environment.

Rational Agents could be physical agents like the one described above or it could also be a
program that operates in a non-physical environment like an operating system. Imagine a bot

Downloaded from EnggTree.com


EnggTree.com

web site operator designed to scan Internet news sources and show the interesting items to its
users, while selling advertising space to generate revenue.
As another example, consider an online tutoring system

Agent Performance Environment Actuator Sensor

Math E Computer display system for


SLA defined score Student, Teacher, Keyboard,
learning exercises, corrections,
on the test parents Mouse
system feedback

Environments can further be classified into various buckets. This would help determine the
intelligence which would need to be built in the agent. These are

 Observable – Full or Partial? If the agents sensors get full access then they do not need to pre-
store any information. Partial may be due to inaccuracy of sensors or incomplete information
about an environment, like limited access to enemy territory
 Number of Agents – For the vacuum cleaner, it works in a single agent environment but for
driver-less taxis, every driver-less taxi is a separate agent and hence multi agent environment
 Deterministic – The number of unknowns in the environment which affect the predictability of
the environment. For example, floor space for cleaning is mostly deterministic, the furniture is
where it is most of the time but taxi driving on a road is non-deterministic.
 Discrete – Does the agent respond when needed or does it have to continuously scan the
environment. Driver-less is continuous, online tutor is discrete
 Static – How often does the environment change. Can the agent learn about the environment and
always do the same thing?
 Episodic – If the response to a certain precept is not dependent on the previous one i.e. it is
stateless (static methods in Java) then it is discrete. If the decision taken now influences the
future decisions then it is a sequential environment.

Agents in Artificial Intelligence

An AI system can be defined as the study of the rational agent and its environment. The agents
sense the environment through sensors and act on their environment through actuators. An AI
agent can have mental properties such as knowledge, belief, intention, etc.

What is an Agent?

An agent can be anything that perceiveits environment through sensors and act upon that
environment through actuators. An Agent runs in the cycle of perceiving, thinking, and acting.
An agent can be:

o Human-Agent: A human agent has eyes, ears, and other organs which work for sensors
and hand, legs, vocal tract work for actuators.
o Robotic Agent: A robotic agent can have cameras, infrared range finder, NLP for
sensors and various motors for actuators.

Downloaded from EnggTree.com


EnggTree.com

o Software Agent: Software agent can have keystrokes, file contents as sensory input and
act on those inputs and display output on the screen.

Hence the world around us is full of agents such as thermostat, cellphone, camera, and even
we are also agents.

Before moving forward, we should first know about sensors, effectors, and actuators.

Sensor: Sensor is a device which detects the change in the environment and sends the
information to other electronic devices. An agent observes its environment through sensors.

Actuators: Actuators are the component of machines that converts energy into motion. The
actuators are only responsible for moving and controlling a system. An actuator can be an
electric motor, gears, rails, etc.

Effectors: Effectors are the devices which affect the environment. Effectors can be legs, wheels,
arms, fingers, wings, fins, and display screen.

Intelligent Agents:

An intelligent agent is an autonomous entity which act upon an environment using sensors and
actuators for achieving goals. An intelligent agent may learn from the environment to achieve
their goals. A thermostat is an example of an intelligent agent.

Following are the main four rules for an AI agent:

o Rule 1: An AI agent must have the ability to perceive the environment.


o Rule 2: The observation must be used to make decisions.
o Rule 3: Decision should result in an action.
o Rule 4: The action taken by an AI agent must be a rational action.

Rational Agent:

A rational agent is an agent which has clear preference, models uncertainty, and acts in a way to
maximize its performance measure with all possible actions.

A rational agent is said to perform the right things. AI is about creating rational agents to use for
game theory and decision theory for various real-world scenarios.

Downloaded from EnggTree.com


EnggTree.com

For an AI agent, the rational action is most important because in AI reinforcement learning
algorithm, for each best possible action, agent gets the positive reward and for each wrong
action, an agent gets a negative reward.

Note: Rational agents in AI are very similar to intelligent agents.

Rationality:

The rationality of an agent is measured by its performance measure. Rationality can be judged on
the basis of following points:

o Performance measure which defines the success criterion.


o Agent prior knowledge of its environment.
o Best possible actions that an agent can perform.
o The sequence of percepts.

Note: Rationality differs from Omniscience because an Omniscient agent knows the actual
outcome of its action and act accordingly, which is not possible in reality.

Structure of an AI Agent

The task of AI is to design an agent program which implements the agent function. The structure
of an intelligent agent is a combination of architecture and agent program. It can be viewed as:

Agent = Architecture + Agent program

Following are the main three terms involved in the structure of an AI agent:

Architecture: Architecture is machinery that an AI agent executes on.

Agent Function: Agent function is used to map a percept to an action

f:P* → A

Agent program: Agent program is an implementation of agent function. An agent program


executes on the physical architecture to produce function f.

PEAS Representation

PEAS is a type of model on which an AI agent works upon. When we define an AI agent or
rational agent, then we can group its properties under PEAS representation model. It is made up
of four words:

o P: Performance measure
o E: Environment
o A: Actuators

Downloaded from EnggTree.com


EnggTree.com

o S: Sensors

Here performance measure is the objective for the success of an agent's behavior.

PEAS for self-driving cars:

Let's suppose a self-driving car then PEAS representation will be:

Performance: Safety, time, legal drive, comfort

Environment: Roads, other vehicles, road signs, pedestrian

Actuators: Steering, accelerator, brake, signal, horn

Sensors: Camera, GPS, speedometer, odometer, accelerometer, sonar.

Example of Agents with their PEAS representation

Agent Performance measure Environment Actuators Sensors

1. Keyboard
o Healthy patient o Patient o Tests
Medical (Entry of symptoms)
Diagnose o Minimized cost o Hospital o Treatments
o Staff

2.
o Cleanness o Room o Wheels o Camera
Vacuum
Cleaner o Efficiency o Table o Brushes o Dirt detection
o Battery life o Wood floor o Vacuum sensor

o Security o Carpet Extractor o Cliff sensor

o Various o Bump Sensor


obstacles o Infrared Wall
Sensor

3. Part -
o Percentage of o Conveyor belt o Jointed Arms o Camera
picking
parts in correct with parts, o Hand o Joint angle
Robot
bins. o Bins sensors.

Agent Environment in AI:

An environment is everything in the world which surrounds the agent, but it is not a part of an
agent itself. An environment can be described as a situation in which an agent is present.

The environment is where agent lives, operate and provide the agent with something to sense and
act upon it. An environment is mostly said to be non-feministic.

Downloaded from EnggTree.com


EnggTree.com

Features of Environment

Environment can have various features from the point of view of an agent:

1. Fully observable vs Partially Observable


2. Static vs Dynamic
3. Discrete vs Continuous
4. Deterministic vs Stochastic
5. Single-agent vs Multi-agent
6. Episodic vs sequential
7. Known vs Unknown
8. Accessible vs Inaccessible

1. Fully observable vs Partially Observable:

o If an agent sensor can sense or access the complete state of an environment at each point
of time then it is a fully observable environment, else it is partially observable.
o A fully observable environment is easy as there is no need to maintain the internal state to
keep track history of the world.
o An agent with no sensors in all environments then such an environment is called
as unobservable.

2. Deterministic vs Stochastic:

o If an agent's current state and selected action can completely determine the next state of
the environment, then such environment is called a deterministic environment.
o A stochastic environment is random in nature and cannot be determined completely by an
agent.
o In a deterministic, fully observable environment, agent does not need to worry about
uncertainty.

3. Episodic vs Sequential:

o In an episodic environment, there is a series of one-shot actions, and only the current
percept is required for the action.
o However, in Sequential environment, an agent requires memory of past actions to
determine the next best actions.

4. Single-agent vs Multi-agent

o If only one agent is involved in an environment, and operating by itself then such an
environment is called single agent environment.
o However, if multiple agents are operating in an environment, then such an environment is
called a multi-agent environment.
o The agent design problems in the multi-agent environment are different from single agent
environment.

Downloaded from EnggTree.com


EnggTree.com

5. Static vs Dynamic:

o If the environment can change itself while an agent is deliberating then such environment
is called a dynamic environment else it is called a static environment.
o Static environments are easy to deal because an agent does not need to continue looking
at the world while deciding for an action.
o However for dynamic environment, agents need to keep looking at the world at each
action.
o Taxi driving is an example of a dynamic environment whereas Crossword puzzles are an
example of a static environment.

6. Discrete vs Continuous:

o If in an environment there are a finite number of percepts and actions that can be
performed within it, then such an environment is called a discrete environment else it is
called continuous environment.
o A chess gamecomes under discrete environment as there is a finite number of moves that
can be performed.
o A self-driving car is an example of a continuous environment.

7. Known vs Unknown

o Known and unknown are not actually a feature of an environment, but it is an agent's
state of knowledge to perform an action.
o In a known environment, the results for all actions are known to the agent. While in
unknown environment, agent needs to learn how it works in order to perform an action.
o It is quite possible that a known environment to be partially observable and an Unknown
environment to be fully observable.

8. Accessible vs Inaccessible

o If an agent can obtain complete and accurate information about the state's environment,
then such an environment is called an Accessible environment else it is called
inaccessible.
o An empty room whose state can be defined by its temperature is an example of an
accessible environment.
o Information about an event on earth is an example of Inaccessible environment.

Search Algorithms in Artificial Intelligence:

Search algorithms are one of the most important areas of Artificial Intelligence.

Problem-solving agents:

In Artificial Intelligence, Search techniques are universal problem-solving methods. Rational


agents or Problem-solving agents in AI mostly used these search strategies or algorithms to

Downloaded from EnggTree.com


EnggTree.com

solve a specific problem and provide the best result. Problem-solving agents are the goal-based
agents and use atomic representation. In this topic, we will learn various problem-solving search
algorithms.

Search Algorithm Terminologies:

o Search: Searchingis a step by step procedure to solve a search-problem in a given search


space. A search problem can have three main factors:
a. Search Space: Search space represents a set of possible solutions, which a system
may have.
b. Start State: It is a state from where agent begins the search.
c. Goal test: It is a function which observe the current state and returns whether the
goal state is achieved or not.
Search tree: A tree representation of search problem is called Search tree. The root of the search
tree is the root node which is corresponding to the initial state.
Actions: It gives the description of all the available actions to the agent.
Transition model: A description of what each action do, can be represented as a transition
model.
Path Cost: It is a function which assigns a numeric cost to each path.
Solution: It is an action sequence which leads from the start node to the goal node.
Optimal Solution: If a solution has the lowest cost among all solutions.

Properties of Search Algorithms:

Following are the four essential properties of search algorithms to compare the efficiency of
these algorithms:

Completeness: A search algorithm is said to be complete if it guarantees to return a solution if at


least any solution exists for any random input.

Optimality: If a solution found for an algorithm is guaranteed to be the best solution (lowest
path cost) among all other solutions, then such a solution for is said to be an optimal solution.

Time Complexity: Time complexity is a measure of time for an algorithm to complete its task.

Space Complexity: It is the maximum storage space required at any point during the search, as
the complexity of the problem.

Types of search algorithms

Based on the search problems we can classify the search algorithms into uninformed (Blind
search) search and informed search (Heuristic search) algorithms.

Downloaded from EnggTree.com


EnggTree.com

Uninformed Search Algorithms:

Uninformed search is a class of general-purpose search algorithms which operates in brute force-
way. Uninformed search algorithms do not have additional information about state or search
space other than how to traverse the tree, so it is also called blind search.

Following are the various types of uninformed search algorithms:

1. Breadth-first Search
2. Depth-first Search
3. Depth-limited Search
4. Iterative deepening depth-first search
5. Uniform cost search
6. Bidirectional Search

1. Breadth-first Search:

o Breadth-first search is the most common search strategy for traversing a tree or graph.
This algorithm searches breadthwise in a tree or graph, so it is called breadth-first search.
o BFS algorithm starts searching from the root node of the tree and expands all successor
node at the current level before moving to nodes of next level.
o The breadth-first search algorithm is an example of a general-graph search algorithm.
o Breadth-first search implemented using FIFO queue data structure.

Advantages:

o BFS will provide a solution if any solution exists.


o If there are more than one solutions for a given problem, then BFS will provide the
minimal solution which requires the least number of steps.

Downloaded from EnggTree.com


EnggTree.com

Disadvantages:

o It requires lots of memory since each level of the tree must be saved into memory to
expand the next level.
o BFS needs lots of time if the solution is far away from the root node.

Example:

In the below tree structure, we have shown the traversing of the tree using BFS algorithm from
the root node S to goal node K. BFS search algorithm traverse in layers, so it will follow the path
which is shown by the dotted arrow, and the traversed path will be:

S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K

Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of nodes
traversed in BFS until the shallowest Node. Where the d= depth of shallowest solution and b is a
node at every state.

T (b) = 1+b2+b3+.......+ bd= O (bd)

Space Complexity: Space complexity of BFS algorithm is given by the Memory size of frontier
which is O(bd).

Completeness: BFS is complete, which means if the shallowest goal node is at some finite
depth, then BFS will find a solution.

Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.

2. Depth-first Search

o Depth-first search isa recursive algorithm for traversing a tree or graph data structure.

Downloaded from EnggTree.com


EnggTree.com

o It is called the depth-first search because it starts from the root node and follows each
path to its greatest depth node before moving to the next path.
o DFS uses a stack data structure for its implementation.
o The process of the DFS algorithm is similar to the BFS algorithm.

Note: Backtracking is an algorithm technique for finding all possible solutions using recursion.

Advantage:

o DFS requires very less memory as it only needs to store a stack of the nodes on the path
from root node to the current node.
o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right
path).

Disadvantage:

o There is the possibility that many states keep re-occurring, and there is no guarantee of
finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.

Example:

In the below search tree, we have shown the flow of depth-first search, and it will follow the
order as:

Root node--->Left node ----> right node.

It will start searching from root node S, and traverse A, then B, then D and E, after traversing E,
it will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal node.

Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.

Downloaded from EnggTree.com


EnggTree.com

Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:

T(n)= 1+ n2+ n3 +.........+ nm=O(nm)

Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)

Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity of DFS is equivalent to the size of the fringe set, which is O(bm).

Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node.

3. Depth-Limited Search Algorithm:

A depth-limited search algorithm is similar to depth-first search with a predetermined limit.


Depth-limited search can solve the drawback of the infinite path in the Depth-first search. In this
algorithm, the node at the depth limit will treat as it has no successor nodes further.

Depth-limited search can be terminated with two Conditions of failure:

o Standard failure value: It indicates that problem does not have any solution.
o Cutoff failure value: It defines no solution for the problem within a given depth limit.

Advantages:

Depth-limited search is Memory efficient.

Disadvantages:

o Depth-limited search also has a disadvantage of incompleteness.

Example:

Downloaded from EnggTree.com


EnggTree.com

Completeness: DLS search algorithm is complete if the solution is above the depth-limit.

Time Complexity: Time complexity of DLS algorithm is O(bℓ).

Space Complexity: Space complexity of DLS algorithm is O(b×ℓ).

Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not optimal
even if ℓ>d.

4. Uniform-cost Search Algorithm:

Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This
algorithm comes into play when a different cost is available for each edge. The primary goal of
the uniform-cost search is to find a path to the goal node which has the lowest cumulative cost.
Uniform-cost search expands nodes according to their path costs form the root node. It can be
used to solve any graph/tree where the optimal cost is in demand. A uniform-cost search
algorithm is implemented by the priority queue. It gives maximum priority to the lowest
cumulative cost. Uniform cost search is equivalent to BFS algorithm if the path cost of all edges
is the same.

Advantages:

o Uniform cost search is optimal because at every state the path with the least cost is
chosen.

Disadvantages:

o It does not care about the number of steps involve in searching and only concerned about
path cost. Due to which this algorithm may be stuck in an infinite loop.

Example:

Completeness:

Downloaded from EnggTree.com


EnggTree.com

Uniform-cost search is complete, such as if there is a solution, UCS will find it.

Time Complexity:

Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal node. Then
the number of steps is = C*/ε+1. Here we have taken +1, as we start from state 0 and end to
C*/ε.

Hence, the worst-case time complexity of Uniform-cost search isO(b1 + [C*/ε])/.

Space Complexity:

The same logic is for space complexity so, the worst-case space complexity of Uniform-cost
search is O(b1 + [C*/ε]).

Optimal:

Uniform-cost search is always optimal as it only selects a path with the lowest path cost.

5. Iterative deepeningdepth-first Search:

The iterative deepening algorithm is a combination of DFS and BFS algorithms. This search
algorithm finds out the best depth limit and does it by gradually increasing the limit until a goal
is found.

This algorithm performs depth-first search up to a certain "depth limit", and it keeps increasing
the depth limit after each iteration until the goal node is found.

This Search algorithm combines the benefits of Breadth-first search's fast search and depth-first
search's memory efficiency.

The iterative search algorithm is useful uninformed search when search space is large, and depth
of goal node is unknown.

Advantages:

o Itcombines the benefits of BFS and DFS search algorithm in terms of fast search and
memory efficiency.

Disadvantages:

o The main drawback of IDDFS is that it repeats all the work of the previous phase.

Example:

Following tree structure is showing the iterative deepening depth-first search. IDDFS algorithm
performs various iterations until it does not find the goal node. The iteration performed by the
algorithm is given as:

Downloaded from EnggTree.com


EnggTree.com

1'st Iteration-----> A
2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G
4'th Iteration------>A, B, D, H, I, E, C, F, K, G
In the fourth iteration, the algorithm will find the goal node.

Completeness:

This algorithm is complete is ifthe branching factor is finite.

Time Complexity:

Let's suppose b is the branching factor and depth is d then the worst-case time complexity
is O(bd).

Space Complexity:

The space complexity of IDDFS will be O(bd).

Optimal:

IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the node.

6. Bidirectional Search Algorithm:

Bidirectional search algorithm runs two simultaneous searches, one form initial state called as
forward-search and other from goal node called as backward-search, to find the goal node.
Bidirectional search replaces one single search graph with two small subgraphs in which one
starts the search from an initial vertex and other starts from goal vertex. The search stops when
these two graphs intersect each other.

Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.

Downloaded from EnggTree.com


EnggTree.com

Advantages:

o Bidirectional search is fast.


o Bidirectional search requires less memory

Disadvantages:

o Implementation of the bidirectional search tree is difficult.


o In bidirectional search, one should know the goal state in advance.

Example:

In the below search tree, bidirectional search algorithm is applied. This algorithm divides one
graph/tree into two sub-graphs. It starts traversing from node 1 in the forward direction and starts
from goal node 16 in the backward direction.

The algorithm terminates at node 9 where two searches meet.

Completeness: Bidirectional Search is complete if we use BFS in both searches.

Time Complexity: Time complexity of bidirectional search using BFS is O(bd).

Space Complexity: Space complexity of bidirectional search is O(bd).

Optimal: Bidirectional search is Optimal.

Informed Search Algorithms:

So far we have talked about the uninformed search algorithms which looked through search
space for all possible solutions of the problem without having any additional knowledge about
search space. But informed search algorithm contains an array of knowledge such as how far we

Downloaded from EnggTree.com


EnggTree.com

are from the goal, path cost, how to reach to goal node, etc. This knowledge help agents to
explore less to the search space and find more efficiently the goal node.

The informed search algorithm is more useful for large search space. Informed search algorithm
uses the idea of heuristic, so it is also called Heuristic search.

Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the
most promising path. It takes the current state of the agent as its input and produces the
estimation of how close agent is from the goal. The heuristic method, however, might not always
give the best solution, but it guaranteed to find a good solution in reasonable time. Heuristic
function estimates how close a state is to the goal. It is represented by h(n), and it calculates the
cost of an optimal path between the pair of states. The value of the heuristic function is always
positive.

Admissibility of the heuristic function is given as:

h(n) <= h*(n)

Pure Heuristic Search:

Pure heuristic search is the simplest form of heuristic search algorithms. It expands nodes based
on their heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In the CLOSED
list, it places those nodes which have already expanded and in the OPEN list, it places nodes
which have yet not been expanded.

On each iteration, each node n with the lowest heuristic value is expanded and generates all its
successors and n is placed to the closed list. The algorithm continues unit a goal state is found.

In the informed search we will discuss two main algorithms which are given below:

o Best First Search Algorithm(Greedy search)


o A* Search Algorithm

1.) Best-first Search Algorithm (Greedy Search):

Greedy best-first search algorithm always selects the path which appears best at that moment. It
is the combination of depth-first search and breadth-first search algorithms. It uses the heuristic
function and search. Best-first search allows us to take the advantages of both algorithms. With
the help of best-first search, at each step, we can choose the most promising node. In the best
first search algorithm, we expand the node which is closest to the goal node and the closest cost
is estimated by heuristic function, i.e.

f(n)= g(n).

Were, h(n)= estimated cost from node n to the goal.

The greedy best first algorithm is implemented by the priority queue.

Downloaded from EnggTree.com


EnggTree.com

Best first search algorithm:


o Step 1: Place the starting node into the OPEN list.
o Step 2: If the OPEN list is empty, Stop and return failure.
o Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and
places it in the CLOSED list.
o Step 4: Expand the node n, and generate the successors of node n.
o Step 5: Check each successor of node n, and find whether any node is a goal node or not.
If any successor node is goal node, then return success and terminate the search, else
proceed to Step 6.
o Step 6: For each successor node, algorithm checks for evaluation function f(n), and then
check if the node has been in either OPEN or CLOSED list. If the node has not been in
both list, then add it to the OPEN list.
o Step 7: Return to Step 2.

Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.

Disadvantages:
o It can behave as an unguided depth-first search in the worst case scenario.
o It can get stuck in a loop as DFS.
o This algorithm is not optimal.

Example:

Consider the below search problem, and we will traverse it using greedy best-first search. At
each iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in the
below table.

Downloaded from EnggTree.com


EnggTree.com

In this search example, we are using two lists which are OPEN and CLOSED Lists. Following
are the iteration for traversing the above example.

9 0

Expand the nodes of S and put in the CLOSED list

Initialization: Open [A, B], Closed [S]

Iteration 1: Open [A], Closed [S, B]

Iteration 2: Open [E, F, A], Closed [S, B]


: Open [E, A], Closed [S, B, F]

Iteration 3: Open [I, G, E, A], Closed [S, B, F]


: Open [I, E, A], Closed [S, B, F, G]

Hence the final solution path will be: S----> B----->F----> G

Time Complexity: The worst case time complexity of Greedy best first search is O(b m).

Space Complexity: The worst case space complexity of Greedy best first search is O(b m).
Where, m is the maximum depth of the search space.

Complete: Greedy best-first search is also incomplete, even if the given state space is finite.

Optimal: Greedy best first search algorithm is not optimal.

2.) A* Search Algorithm:

A* search is the most commonly known form of best-first search. It uses heuristic function h(n),
and cost to reach the node n from the start state g(n). It has combined features of UCS and
greedy best-first search, by which it solve the problem efficiently. A* search algorithm finds the
shortest path through the search space using the heuristic function. This search algorithm

Downloaded from EnggTree.com


EnggTree.com

expands less search tree and provides optimal result faster. A* algorithm is similar to UCS
except that it uses g(n)+h(n) instead of g(n).

In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we
can combine both costs as following, and this sum is called as a fitness number.

Algorithm of A* search:

Step1: Place the starting node in the OPEN list.

Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.

Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise

Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute
evaluation function for n' and place into Open list.

Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.

Step 6: Return to Step 2.

Advantages:
o A* search algorithm is the best algorithm than other search algorithms.
o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.

Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and
approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.

Downloaded from EnggTree.com


EnggTree.com

Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of
all states is given in the below table so we will calculate the f(n) of each state using the formula
f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.

Here we will use OPEN and CLOSED list.

Solution:

Initialization: {(S, 5)}

Iteration1: {(S--> A, 4), (S-->G, 10)}

Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}

Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}

Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with cost
6.

Downloaded from EnggTree.com


EnggTree.com

Points to remember:

o A* algorithm returns the path which occurred first, and it does not search for all
remaining paths.
o The efficiency of A* algorithm depends on the quality of heuristic.
o A* algorithm expands all nodes which satisfy the condition f(n)<="" li="">

Complete: A* algorithm is complete as long as:

o Branching factor is finite.


o Cost at every action is fixed.

Optimal: A* search algorithm is optimal if it follows below two conditions:

o Admissible: the first condition requires for optimality is that h(n) should be an
admissible heuristic for A* tree search. An admissible heuristic is optimistic in nature.
o Consistency: Second required condition is consistency for only A* graph-search.

If the heuristic function is admissible, then A* tree search will always find the least cost path.

Time Complexity: The time complexity of A* search algorithm depends on heuristic function,
and the number of nodes expanded is exponential to the depth of solution d. So the time
complexity is O(b^d), where b is the branching factor.

Space Complexity: The space complexity of A* search algorithm is O(b^d)

Downloaded from EnggTree.com


EnggTree.com

AL3391 ARTIFICIAL INTELLIGENCE


UNIT II

1. Heuristic Search Strategies:

What is Heuristics?

A heuristic is a technique that is used to solve a problem faster than the classic methods. These
techniques are used to find the approximate solution of a problem when classical methods do not.
Heuristics are said to be the problem-solving techniques that result in practical and quick
solutions.

optimal, but those solutions are sufficient in a given limited timeframe.

Why do we need heuristics?

Heuristics are used in situations in which there is the requirement of a short-term solution. On
facing complex situations with limited resources and time, Heuristics can help the companies to
make quick decisions by shortcuts and approximated calculations. Most of the heuristic methods
involve mental shortcuts to make decisions on past experiences.

The heuristic method might not always provide us the finest solution, but it is assured that it
helps us find a good solution in a reasonable time.

Based on context, there can be different heuristic methods that correlate with the problem's
scope. The most common heuristic methods are - trial and error, guesswork, the process of
elimination, historical data analysis. These methods involve simply available information that is
not particular to the problem but is most appropriate. They can include representative, affect, and
availability heuristics.

Heuristic search techniques in AI (Artificial Intelligence)

We can perform the Heuristic techniques into two categories:

Downloaded from EnggTree.com


EnggTree.com

Direct Heuristic Search techniques in AI

It includes Blind Search, Uninformed Search, and Blind control strategy. These search
techniques are not always possible as they require much memory and time. These techniques
search the complete space for a solution and use the arbitrary ordering of operations.

The examples of Direct Heuristic search techniques include Breadth-First Search (BFS) and
Depth First Search (DFS).

Weak Heuristic Search techniques in AI

It includes Informed Search, Heuristic Search, and Heuristic control strategy. These techniques
are helpful when they are applied properly to the right types of tasks. They usually require
domain-specific information.

The examples of Weak Heuristic search techniques include Best First Search (BFS) and A*.

Before describing certain heuristic techniques, let's see some of the techniques listed below:

o Bidirectional Search
o A* search
o Simulated Annealing
o Hill Climbing
o Best First search
o Beam search

First, let's talk about the Hill climbing in Artificial intelligence.

Hill Climbing Algorithm

It is a technique for optimizing the mathematical problems. Hill Climbing is widely used when a
good heuristic is available.

It is a local search algorithm that continuously moves in the direction of increasing


elevation/value to find the mountain's peak or the best solution to the problem. It terminates
when it reaches a peak value where no neighbor has a higher value. Traveling-salesman Problem
is one of the widely discussed examples of the Hill climbing algorithm, in which we need to
minimize the distance traveled by the salesman.

It is also called greedy local search as it only looks to its good immediate neighbor state and not
beyond that. The steps of a simple hill-climbing algorithm are listed below:

Step 1: Evaluate the initial state. If it is the goal state, then return success and Stop.

Step 2: Loop Until a solution is found or there is no new operator left to apply.

Downloaded from EnggTree.com


EnggTree.com

Step 3: Select and apply an operator to the current state.

Step 4: Check new state:

If it is a goal state, then return to success and quit.

Else if it is better than the current state, then assign a new state as a current state.

Else if not better than the current state, then return to step2.

Step 5: Exit.

Best first search (BFS)

This algorithm always chooses the path which appears best at that moment. It is the combination
of depth-first search and breadth-first search algorithms. It lets us to take the benefit of both
algorithms. It uses the heuristic function and search. With the help of the best-first search, at
each step, we can choose the most promising node.

Best first search algorithm:

Step 1: Place the starting node into the OPEN list.

Step 2: If the OPEN list is empty, Stop and return failure.

Step 3: Remove the node n from the OPEN list, which has the lowest value of h(n), and places it
in the CLOSED list.

Step 4: Expand the node n, and generate the successors of node n.

Step 5: Check each successor of node n, and find whether any node is a goal node or not. If any
successor node is the goal node, then return success and stop the search, else continue to next
step.

Step 6: For each successor node, the algorithm checks for evaluation function f(n) and then
check if the node has been in either OPEN or CLOSED list. If the node has not been in both lists,
then add it to the OPEN list.

Step 7: Return to Step 2.

A* Search Algorithm

A* search is the most commonly known form of best-first search. It uses the heuristic function
h(n) and cost to reach the node n from the start state g(n). It has combined features of UCS and
greedy best-first search, by which it solve the problem efficiently.

Downloaded from EnggTree.com


EnggTree.com

It finds the shortest path through the search space using the heuristic function. This search
algorithm expands fewer search tree and gives optimal results faster.

Algorithm of A* search:

Step 1: Place the starting node in the OPEN list.

Step 2: Check if the OPEN list is empty or not. If the list is empty, then return failure and stops.

Step 3: Select the node from the OPEN list which has the smallest value of the evaluation
function (g+h). If node n is the goal node, then return success and stop, otherwise.

Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list. If not, then compute the
evaluation function for n' and place it into the Open list.

Step 5: Else, if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.

Step 6: Return to Step 2.

Examples of heuristics in everyday life

Some of the real-life examples of heuristics that people use as a way to solve a problem:

o Common sense: It is a heuristic that is used to solve a problem based on the observation
of an individual.
o Rule of thumb: In heuristics, we also use a term rule of thumb. This heuristic allows an
individual to make an approximation without doing an exhaustive search.
o Working backward: It lets an individual solve a problem by assuming that the problem
is already being solved by them and working backward in their minds to see how much a
solution has been reached.
o Availability heuristic: It allows a person to judge a situation based on the examples of
similar situations that come to mind.

Downloaded from EnggTree.com


EnggTree.com

o Familiarity heuristic: It allows a person to approach a problem on the fact that an


individual is familiar with the same situation, so one should act similarly as he/she acted
in the same situation before.
o Educated guess: It allows a person to reach a conclusion without doing an exhaustive
search. Using it, a person considers what they have observed in the past and applies that
history to the situation where there is not any definite answer has decided yet.

Types of heuristics

There are various types of heuristics, including the availability heuristic, affect heuristic and
representative heuristic. Each heuristic type plays a role in decision-making. Let's discuss about
the Availability heuristic, affect heuristic, and Representative heuristic.

Availability heuristic

Availability heuristic is said to be the judgment that people make regarding the likelihood of an
event based on information that quickly comes into mind. On making decisions, people typically
rely on the past knowledge or experience of an event. It allows a person to judge a situation
based on the examples of similar situations that come to mind.

Representative heuristic

It occurs when we evaluate an event's probability on the basis of its similarity with another event.

Example: We can understand the representative heuristic by the example of product packaging,
as consumers tend to associate the products quality with the external packaging of a product. If a
company packages its products that remind you of a high quality and well-known product, then
consumers will relate that product as having the same quality as the branded product.

So, instead of evaluating the product based on its quality, customers correlate the products
quality based on the similarity in packaging.

Affect heuristic

It is based on the negative and positive feelings that are linked with a certain stimulus. It includes
quick feelings that are based on past beliefs. Its theory is one's emotional response to a stimulus
that can affect the decisions taken by an individual.

When people take a little time to evaluate a situation carefully, they might base their decisions
based on their emotional response.

Example: The affect heuristic can be understood by the example of advertisements.


Advertisements can influence the emotions of consumers, so it affects the purchasing decision of
a consumer. The most common examples of advertisements are the ads of fast food. When fast-
food companies run the advertisement, they hope to obtain a positive emotional response that
pushes you to positively view their products.

Downloaded from EnggTree.com


EnggTree.com

If someone carefully analyzes the benefits and risks of consuming fast food, they might decide
that fast food is unhealthy. But people rarely take time to evaluate everything they see and
generally make decisions based on their automatic emotional response. So, Fast food companies
present advertisements that rely on such type of Affect heuristic for generating a positive
emotional response which results in sales.

Limitation of heuristics

Along with the benefits, heuristic also has some limitations.

o Although heuristics speed up our decision-making process and also help us to solve
problems, they can also introduce errors just because something has worked accurately in
the past, so it does not mean that it will work again.
o It will hard to find alternative solutions or ideas if we always rely on the existing
solutions or heuristics.

2. Heuristic Functions in Artificial Intelligence:

Heuristic Functions in AI: As we have already seen that an informed search make use of
heuristic functions in order to reach the goal node in a more prominent way. Therefore, there are
several pathways in a search tree to reach the goal node from the current node. The selection of a
good heuristic function matters certainly. A good heuristic function is determined by its
efficiency. More is the information about the problem, more is the processing time.
Some toy problems, such as 8-puzzle, 8-queen, tic-tac-toe, etc., can be solved more efficiently
with the help of a heuristic function. Let’s see how
Consider the following 8-puzzle problem where we have a start state and a goal state. Our task is
to slide the tiles of the current/start state and place it in an order followed in the goal state. There
can be four moves either left, right, up, or down. There can be several ways to convert the
current/start state to the goal state, but, we can use a heuristic function h(n) to solve the problem
more efficiently.

A heuristic function for the 8-puzzle problem is defined below:


h(n)=Number of tiles out of position.
So, there is total of three tiles out of position i.e., 6,5 and 4. Do not count the empty tile present
in the goal state). i.e. h(n)=3. Now, we require to minimize the value of h(n) =0.
We can construct a state-space tree to minimize the h(n) value to 0, as shown below:

Downloaded from EnggTree.com


EnggTree.com

It is seen from the above state space tree that the goal state is minimized from h(n)=3 to h(n)=0.
However, we can create and use several heuristic functions as per the reqirement. It is also clear
from the above example that a heuristic function h(n) can be defined as the information required
to solve a given problem more efficiently. The information can be related to the nature of the
state, cost of transforming from one state to another, goal node characterstics, etc., which is
expressed as a heuristic function.

3. Local Search Algorithms and Optimization Problem:


The informed and uninformed search expands the nodes systematically in two ways:

 keeping different paths in the memory and


 selecting the best suitable path,

Which leads to a solution state required to reach the goal node. But beyond these “classical
search algorithms," we have some “local search algorithms” where the path cost does not
matters, and only focus on solution-state needed to reach the goal node.
A local search algorithm completes its task by traversing on a single current node rather than
multiple paths and following the neighbors of that node generally.

Downloaded from EnggTree.com


EnggTree.com

Although local search algorithms are not systematic, still they have the following two
advantages:

 Local search algorithms use a very little or constant amount of memory as they operate
only on a single path.
 Most often, they find a reasonable solution in large or infinite state spaces where the
classical or systematic algorithms do not work.

Does the local search algorithm work for a pure optimized problem?
Yes, the local search algorithm works for pure optimized problems. A pure optimization problem
is one where all the nodes can give a solution. But the target is to find the best state out of all
according to the objective function. Unfortunately, the pure optimization problem fails to find
high-quality solutions to reach the goal state from the current state.
Note: An objective function is a function whose value is either minimized or maximized in
different contexts of the optimization problems. In the case of search algorithms, an objective
function can be the path cost for reaching the goal node, etc.
Working of a Local search algorithm
Let's understand the working of a local search algorithm with the help of an example:
Consider the below state-space landscape having both:

 Location: It is defined by the state.


 Elevation: It is defined by the value of the objective function or heuristic cost function.

The local search algorithm explores the above landscape by finding the following two points:

 Global Minimum: If the elevation corresponds to the cost, then the task is to find the
lowest valley, which is known as Global Minimum.
 Global Maxima: If the elevation corresponds to an objective function, then it finds the
highest peak which is called as Global Maxima. It is the highest point in the valley.

We will understand the working of these points better in Hill-climbing search.


Below are some different types of local searches:

Downloaded from EnggTree.com


EnggTree.com

 Hill-climbing Search
 Simulated Annealing
 Local Beam Search

Hill Climbing Algorithm in AI


Hill Climbing Algorithm: Hill climbing search is a local search problem. The purpose of the hill
climbing search is to climb a hill and reach the topmost peak/ point of that hill. It is based on
the heuristic search technique where the person who is climbing up on the hill estimates the
direction which will lead him to the highest peak.
State-space Landscape of Hill climbing algorithm
To understand the concept of hill climbing algorithm, consider the below landscape representing
the goal state/peak and the current state of the climber. The topographical regions shown in the
figure can be defined as:

 Global Maximum: It is the highest point on the hill, which is the goal state.
 Local Maximum: It is the peak higher than all other peaks but lower than the global
maximum.
 Flat local maximum: It is the flat area over the hill where it has no uphill or downhill. It
is a saturated point of the hill.
 Shoulder: It is also a flat area where the summit is possible.
 Current state: It is the current position of the person.

Types of Hill climbing search algorithm


There are following types of hill-climbing search:

 Simple hill climbing


 Steepest-ascent hill climbing
 Stochastic hill climbing
 Random-restart hill climbing

Downloaded from EnggTree.com


EnggTree.com

Simple hill climbing search


Simple hill climbing is the simplest technique to climb a hill. The task is to reach the highest
peak of the mountain. Here, the movement of the climber depends on his move/steps. If he finds
his next step better than the previous one, he continues to move else remain in the same state.
This search focus only on his previous and next step.
Simple hill climbing Algorithm

1. Create a CURRENT node, NEIGHBOUR node, and a GOAL node.


2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Else CURRENT node<= NEIGHBOUR node, move ahead.
4. Loop until the goal is not reached or a point is not found.

Steepest-ascent hill climbing


Steepest-ascent hill climbing is different from simple hill climbing search. Unlike simple hill
climbing search, It considers all the successive nodes, compares them, and choose the node
which is closest to the solution. Steepest hill climbing search is similar to best-first
search because it focuses on each node instead of one.
Note: Both simple, as well as steepest-ascent hill climbing search, fails when there is no closer
node.
Steepest-ascent hill climbing algorithm

1. Create a CURRENT node and a GOAL node.


2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Loop until a better node is not found to reach the solution.
4. If there is any better successor node present, expand it.
5. When the GOAL is attained, return GOAL and terminate.

Stochastic hill climbing


Stochastic hill climbing does not focus on all the nodes. It selects one node at random and
decides whether it should be expanded or search for a better one.
Random-restart hill climbing
Random-restart algorithm is based on try and try strategy. It iteratively searches the node and
selects the best one at each step until the goal is not found. The success depends most commonly
on the shape of the hill. If there are few plateaus, local maxima, and ridges, it becomes easy to
reach the destination.

Downloaded from EnggTree.com


EnggTree.com

Limitations of Hill climbing algorithm


Hill climbing algorithm is a fast and furious approach. It finds the solution state rapidly because
it is quite easy to improve a bad state. But, there are following limitations of this search:

 Local Maxima: It is that peak of the mountain which is highest than all its neighboring
states but lower than the global maxima. It is not the goal peak because there is another peak higher than it.

 Plateau: It is a flat surface area where no uphill exists. It becomes difficult for the
climber to decide that in which direction he should move to reach the goal point.
Sometimes, the person gets lost in the flat area.

 Ridges: It is a challenging problem where the person finds two or more local maxima of
the same height commonly. It becomes difficult for the person to navigate the right point
and stuck to that point itself.

Downloaded from EnggTree.com


EnggTree.com

Simulated Annealing
Simulated annealing is similar to the hill climbing algorithm. It works on the current situation. It
picks a random move instead of picking the best move. If the move leads to the improvement
of the current situation, it is always accepted as a step towards the solution state, else it accepts
the move having a probability less than 1. This search technique was first used in 1980 to
solve VLSI layout problems. It is also applied for factory scheduling and other large
optimization tasks.
Local Beam Search
Local beam search is quite different from random-restart search. It keeps track of k states instead
of just one. It selects k randomly generated states, and expand them at each step. If any state is a
goal state, the search stops with success. Else it selects the best k successors from the complete
list and repeats the same process. In random-restart search where each search process runs
independently, but in local beam search, the necessary information is shared between the parallel
search processes.
Disadvantages of Local Beam search

 This search can suffer from a lack of diversity among the k states.
 It is an expensive version of hill climbing search.

4. Local search in continuous space:

Downloaded from EnggTree.com


EnggTree.com

Downloaded from EnggTree.com


EnggTree.com

Downloaded from EnggTree.com


EnggTree.com

5. Search with non-deterministic actions:

Downloaded from EnggTree.com


EnggTree.com

Downloaded from EnggTree.com


EnggTree.com

6. Search in partial observable environments:

Downloaded from EnggTree.com


EnggTree.com

Downloaded from EnggTree.com


EnggTree.com

Downloaded from EnggTree.com


EnggTree.com

7. Online Search Agents and Unknown Environments:

Downloaded from EnggTree.com


EnggTree.com

Downloaded from EnggTree.com


EnggTree.com

Downloaded from EnggTree.com


EnggTree.com

AL3391 ARTIFICIAL INTELLIGENCE


UNIT III

1. Game Theory:

Game theory is basically a branch of mathematics that is used to typical strategic interaction
between different players (agents), all of which are equally rational, in a context with
predefined rules (of playing or maneuvering) and outcomes. Every player or agent is a rational
entity who is selfish and tries to maximize the reward to be obtained using a particular
strategy. All the players abide by certain rules in order to receive a predefined playoff- a
reward after a certain outcome. Hence, a GAME can be defined as a set of players, actions,
strategies, and a final playoff for which all the players are competing.
Game Theory has now become a describing factor for both Machine Learning algorithms and
many daily life situations.

Consider the SVM (Support Vector Machine) for instance. According to Game Theory, the
SVM is a game between 2 players where one player challenges the other to find the best hyper-
plane after providing the most difficult points for classification. The final playoff of this game
is a solution that will be a trade-off between the strategic abilities of both players competing.

Nash equilibrium:
Nash equilibrium can be considered the essence of Game Theory. It is basically a state, a point
of equilibrium of collaboration of multiple players in a game. Nash Equilibrium guarantees
maximum profit to each player.
Let us try to understand this with the help of Generative Adversarial Networks (GANs).
What is GAN?

It is a combination of two neural networks: the Discriminator and the Generator. The
Generator Neural Network is fed input images which it analyzes and then produces new
sample images, which are made to represent the actual input images as close as possible. Once
the images have been produced, they are sent to the Discriminator Neural Network. This neural
network judges the images sent to it and classifies them as generated images and actual input
images. If the image is classified as the original image, the DNN changes its parameters of
judging. If the image is classified as a generated image, the image is rejected and returned to
the GNN. The GNN then alters its parameters in order to improve the quality of the image
produced.
This is a competitive process which goes on until both neural networks do not require to make
any changes in their parameters and there can be no further improvement in both neural
networks. This state of no further improvement is known as NASH EQUILIBRIUM. In other
words, GAN is a 2-player competitive game where both players are continuously optimizing
themselves to find a Nash Equilibrium.

Downloaded from EnggTree.com


EnggTree.com

But how do we know if the game has reached Nash Equilibrium?

In any game, one of the agents is required to disclose their strategy in front of the other agents.
After the revelation, if none of the players changes their strategies, it is understood that the
game has reached Nash Equilibrium.

Now that we are aware of the basics of Game Theory, let us try to understand how Nash
Equilibrium is attained in a simultaneous game. There are many examples but the most famous
is the Prisoner’s Dilemma. There are some more examples such as the Closed-bag exchange
Game, the Friend or For Game, and the iterated Snowdrift Game.

In all these games, two players are involved and the final playoff is a result of a decision that
has to be made by both players. Both players have to make a choice between defection and co-
operation. If both players cooperate, the final playoff will turn out to be positive for both.
However, if both defect, the final playoff will be negative for both players. If there is a
combination of one player defecting and the other co-operating, the final playoff will be
positive for one and negative for another.

Here, Nash Equilibrium plays an important role. Only if both players jot out a strategy that
benefits each other and provide both with a positive playoff, the solution to this problem will
be optimal.

There are many more real examples and a number of pieces of code that try to solve this
dilemma. The basic essence, however, is the attainment of the Nash Equilibrium in an
uncomfortable situation.
Where is GAME THEORY now?

Game Theory is increasingly becoming a part of the real-world in its various applications in
areas like public health services, public safety, and wildlife. Currently, game theory is being
used in adversary training in GANs, multi-agent systems, and imitation and reinforcement
learning. In the case of perfect information and symmetric games, many Machine Learning and

Downloaded from EnggTree.com


EnggTree.com

Deep Learning techniques are applicable. The real challenge lies in the development of
techniques to handle incomplete information games, such as Poker. The complexity of the
game lies in the fact that there are too many combinations of cards and the uncertainty of the
cards being held by the various players.

Types of Games:
Currently, there are about 5 types of classification of games. They are as follows:
1. Zero-Sum and Non-Zero Sum Games: In non-zero-sum games, there are multiple players
and all of them have the option to gain a benefit due to any move by another player. In zero-
sum games, however, if one player earns something, the other players are bound to lose a
key playoff.
2. Simultaneous and Sequential Games: Sequential games are the more popular games where
every player is aware of the movement of another player. Simultaneous games are more
difficult as in them, the players are involved in a concurrent game. BOARD GAMES are the
perfect example of sequential games and are also referred to as turn-based or extensive-form
games.

Perfect Information game.


4. Asymmetric and Symmetric Games: Asymmetric games are those win in which each player
has a different and usually conflicting final goal. Symmetric games are those in which all
players have the same ultimate goal but the strategy being used by each is completely
different.
5. Co-operative and Non-Co-operative Games: In non-co-operative games, every player plays
for himself while in co-operative games, players form alliances in order to achieve the final
goal.

2. Optimal Decisions in Games:

Humans’ intellectual capacities have been engaged by games for as long as civilization has
existed, sometimes to an alarming degree. Games are an intriguing subject for AI researchers
because of their abstract character. A game’s state is simple to depict, and actors are usually
limited to a small number of actions with predetermined results. Physical games, such as
croquet and ice hockey, contain significantly more intricate descriptions, a much wider variety
of possible actions, and rather ambiguous regulations defining the legality of activities. With
the exception of robot soccer, these physical games have not piqued the AI community’s
interest.
Games are usually intriguing because they are difficult to solve. Chess, for example, has an
average branching factor of around 35, and games frequently stretch to 50 moves per player,
therefore the search tree has roughly 35100 or 10154 nodes (despite the search graph having

Downloaded from EnggTree.com


EnggTree.com

“only” about 1040 unique nodes). As a result, games, like the real world, necessitate the ability
to make some sort of decision even when calculating the best option is impossible.
Inefficiency is also heavily punished in games. Whereas a half-efficient implementation of A
search will merely take twice as long to complete, a chess software that is half as efficient in
utilizing its available time will almost certainly be beaten to death, all other factors being
equal. As a result of this research, a number of intriguing suggestions for making the most use
of time have emerged.

Optimal Decision Making in Games

Let us start with games with two players, whom we’ll refer to as MAX and MIN for obvious
reasons. MAX is the first to move, and then they take turns until the game is finished. At the
conclusion of the game, the victorious player receives points, while the loser receives
penalties. A game can be formalized as a type of search problem that has the following
elements:
 S0: The initial state of the game, which describes how it is set up at the start.
 Player (s): Defines which player in a state has the move.
 Actions (s): Returns a state’s set of legal moves.
 Result (s, a): A transition model that defines a move’s outcome.
 Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise.
Terminal states are those in which the game has come to a conclusion.
 Utility (s, p): A utility function (also known as a payout function or objective function )
determines the final numeric value for a game that concludes in the terminal state s for
player p. The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2.
Backgammon’s payoffs range from 0 to +192, but certain games have a greater range of
possible outcomes. A zero-sum game is defined (confusingly) as one in which the total
reward to all players is the same for each game instance. Chess is a zero-sum game because
each game has a payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a
preferable name, 22 but zero-sum is the usual term and makes sense if each participant is
charged 1.

The game tree for the game is defined by the beginning state, ACTIONS function, and
RESULT function—a tree in which the nodes are game states and the edges represent
movements. The figure below depicts a portion of the tic-tac-toe game tree (noughts and
crosses). MAX may make nine different maneuvers from his starting position. The game
alternates between MAXs setting an X and MINs placing an O until we reach leaf nodes
corresponding to terminal states, such as one player having three in a row or all of the
squares being filled. The utility value of the terminal state from the perspective of MAX is
shown by the number on each leaf node; high values are thought to be beneficial for MAX
and bad for MIN

Downloaded from EnggTree.com


EnggTree.com

The game tree for tic-tac-toe is relatively short, with just 9! = 362,880 terminal nodes.
However, because there are over 1040 nodes in chess, the game tree is better viewed as a
theoretical construct that cannot be realized in the actual world. But, no matter how big the
game tree is, MAX’s goal is to find a solid move. A tree that is superimposed on the whole
game tree and examines enough nodes to allow a player to identify what move to make is
referred to as a search tree.

A sequence of actions leading to a goal state—a terminal state that is a win—would be the best
solution in a typical search problem. MIN has something to say about it in an adversarial
search. MAX must therefore devise a contingent strategy that specifies M A X’s initial state
move, then MAX’s movements in the states resulting from every conceivable MIN response,
then MAX’s moves in the states resulting from every possible MIN reaction to those moves,
and so on. This is quite similar to the AND-OR search method, with MAX acting as OR and
MIN acting as AND. When playing an infallible opponent, an optimal strategy produces results
that are as least as excellent as any other plan. We’ll start by demonstrating how to find the
best plan.
We’ll move to the trivial game in the figure below since even a simple game like tic-tac-toe is
too complex for us to draw the full game tree on one page. MAX’s root node moves are
designated by the letters a1, a2, and a3. MIN’s probable answers to a1 are b1, b2, b3, and so
on. This game is over after MAX and MIN each make one move. (In game terms, this tree
consists of two half-moves and is one move deep, each of which is referred to as a ply.) The
terminal states in this game have utility values ranging from 2 to 14.

Game’s Utility Function

The optimal strategy can be found from the minimax value of each node, which we express as
MINIMAX, given a game tree (n). Assuming that both players play optimally from there

Downloaded from EnggTree.com


EnggTree.com

through the finish of the game, the utility (for MAX) of being in the corresponding state is the
node’s minimax value. The usefulness of a terminal state is obviously its minimax value.
Furthermore, if given the option, MAX prefers to shift to a maximum value state, whereas
MIN wants to move to a minimum value state. So here’s what we’ve got:

Let’s use these definitions to analyze the game tree shown in the figure above. The game’s
UTILITY function provides utility values to the terminal nodes on the bottom level. Because
the first MIN node, B, has three successor states with values of 3, 12, and 8, its minimax value
is 3. Minimax value 2 is also used by the other two MIN nodes. The root node is a MAX node,
with minimax values of 3, 2, and 2, resulting in a minimax value of 3. We can also find the
root of the minimax decision: action a1 is the best option for MAX since it leads to the highest
minimax value.
This concept of optimal MAX play requires that MIN plays optimally as well—it maximizes
MAX’s worst-case outcome. What happens if MIN isn’t performing at its best? Then it’s a
simple matter of demonstrating that MAX can perform even better. Other strategies may
outperform the minimax method against suboptimal opponents, but they will always
outperform optimal opponents.

3. Alpha-Beta Pruning Search:

o Alpha-beta pruning is a modified version of the minimax algorithm. It is an


optimization technique for the minimax algorithm.

Downloaded from EnggTree.com


EnggTree.com

o As we have seen in the minimax search algorithm that the number of game states it has
to examine are exponential in depth of the tree. Since we cannot eliminate the exponent,
but we can cut it to half. Hence there is a technique by which without checking each
node of the game tree we can compute the correct minimax decision, and this technique
is called pruning. This involves two threshold parameter Alpha and beta for future
expansion, so it is called alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
prune the tree leaves but also entire sub-tree.
o The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at any point along
the path of Maximizer. The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along the
path of Minimizer. The initial value of beta is +∞.
The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the final
decision but making algorithm slow. Hence by pruning these nodes, it makes the algorithm
fast.

Condition for Alpha-beta pruning:

The main condition which required for alpha-beta pruning

α>=β

Key points about alpha-beta pruning:


o The Max player will only update the value of alpha.
o The Min player will only update the value of beta.
o While backtracking the tree, the node values will be passed to upper nodes instead of
values of alpha and beta.
o We will only pass the alpha, beta values to the child nodes.

Working of Alpha-Beta Pruning:

Let's take an example of two-player search tree to understand the working of Alpha-beta
pruning

Step 1: At the first step the, Max player will start first move from node A where α= -∞ and
β= +∞, these value of alpha and beta passed down to node B where again α= -∞ and β= +∞,
and Node B passes the same value to its child D.

Downloaded from EnggTree.com


EnggTree.com

Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D
and node value will also 3.

Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a
turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min
(∞, 3) = 3, hence at node B now α= -∞, and β= 3.

Downloaded from EnggTree.com


EnggTree.com

In the next step, algorithm traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.

Step 4: At node E, Max will take its turn, and the value of alpha will change. The current
value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3,
where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.

Downloaded from EnggTree.com


EnggTree.com

Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node
A, the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3,
and β= +∞, these two values now passes to right successor of A which is Node C.

At node C, α=3 and β= +∞, and the same values will be passed on to node F.

Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α
remains 3, but the node value of F will become 1.

Downloaded from EnggTree.com


EnggTree.com

Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of
beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
and again it satisfies the condition α>=β, so the next child of C which is G will be pruned,
and the algorithm will not compute the entire sub-tree G.

Downloaded from EnggTree.com


EnggTree.com

Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value for the maximizer is 3 for this
example.

Move Ordering in Alpha-Beta pruning:

The effectiveness of alpha-beta pruning is highly dependent on the order in which each
node is examined. Move order is an important aspect of alpha-beta pruning.

It can be of two types:

Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the
leaves of the tree, and works exactly as minimax algorithm. In this case, it also consumes
more time because of alpha-beta factors, such a move of pruning is called worst ordering.
In this case, the best move occurs on the right side of the tree. The time complexity for
such an order is O(bm).

Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning
happens in the tree, and best moves occur at the left side of the tree. We apply DFS hence it

Downloaded from EnggTree.com


EnggTree.com

first search left of the tree and go deep twice as minimax algorithm in the same amount of
time. Complexity in ideal ordering is O(bm/2).

Rules to find good ordering:

Following are some rules to find good ordering in alpha-beta pruning:

o Occur the best move from the shallowest node.


o Order the nodes in the tree such that the best nodes are checked first.
o Use domain knowledge while finding the best move. Ex: for Chess, try order: captures
first, then threats, then forward moves, backward moves.
o We can bookkeep the states, as there is a possibility that states may repeat.

4. Monte Carlo Tree Search (MCTS):

Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence
(AI). It is a probabilistic and heuristic driven search algorithm that combines the classic tree
search implementations alongside machine learning principles of reinforcement learning.
In tree search, there’s always the possibility that the current best action is actually not the most
optimal action. In such cases, MCTS algorithm becomes useful as it continues to evaluate other
alternatives periodically during the learning phase by executing them, instead of the current
perceived optimal strategy. This is known as the ” exploration-exploitation trade-off “. It
exploits the actions and strategies that is found to be the best till now but also must continue to
explore the local space of alternative decisions and find out if they could replace the current
best.

Exploration helps in exploring and discovering the unexplored parts of the tree, which could
result in finding a more optimal path. In other words, we can say that exploration expands the
tree’s breadth more than its depth. Exploration can be useful to ensure that MCTS is not
overlooking any potentially better paths. But it quickly becomes inefficient in situations with
large number of steps or repetitions. In order to avoid that, it is balanced out by exploitation.
Exploitation sticks to a single path that has the greatest estimated value. This is a greedy
approach and this will extend the tree’s depth more than its breadth. In simple words, UCB
formula applied to trees helps to balance the exploration-exploitation trade-off by periodically
exploring relatively unexplored nodes of the tree and discovering potentially more optimal
paths than the one it is currently exploiting.
For this characteristic, MCTS becomes particularly useful in making optimal decisions in
Artificial Intelligence (AI) problems.

Monte Carlo Tree Search (MCTS) algorithm:


In MCTS, nodes are the building blocks of the search tree. These nodes are formed based on
the outcome of a number of simulations. The process of Monte Carlo Tree Search can be
broken down into four distinct steps, viz., selection, expansion, simulation and

Downloaded from EnggTree.com


EnggTree.com

backpropagation. Each of these steps is explained in details below:

 Selection: In this process, the MCTS algorithm traverses the current tree from the root node
using a specific strategy. The strategy uses an evaluation function to optimally select nodes
with the highest estimated value. MCTS uses the Upper Confidence Bound (UCB) formula
applied to trees as the strategy in the selection process to traverse the tree. It balances the
exploration-exploitation trade-off. During tree traversal, a node is selected based on some
parameters that return the maximum value. The parameters are characterized by the formula
that is typically used for this purpose is given below.

 where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns the greatest
value from the above equation will be one that will get selected. During traversal, once a
child node is found which is also a leaf node, the MCTS jumps into the expansion step.
 Expansion: In this process, a new child node is added to the tree to that node which was
optimally reached during the selection process.
 Simulation: In this process, a simulation is performed by choosing moves or strategies until
a result or predefined state is achieved.
 Backpropagation: After determining the value of the newly added node, the remaining tree
must be updated. So, the backpropagation process is performed, where it backpropagates
from the new node to the root node. During the process, the number of simulation stored in
each node is incremented. Also, if the new node’s simulation results in a win, then the
number of wins is also incremented.
The above steps can be visually understood by the diagram given below:

Downloaded from EnggTree.com


EnggTree.com

These types of algorithms are particularly useful in turn based games where there is no element
of chance in the game mechanics, such as Tic Tac Toe, Connect 4, Checkers, Chess, Go, etc.
This has recently been used by Artificial Intelligence Programs like AlphaGo, to play against
the world’s top Go players. But, its application is not limited to games only. It can be used in
any situation which is described by state-action pairs and simulations used to forecast
outcomes.
As we can see, the MCTS algorithm reduces to a very few set of functions which we can use
any choice of games or in any optimizing strategy.

Advantages of Monte Carlo Tree Search:

1. MCTS is a simple algorithm to implement.


2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate effectively without
any knowledge in the particular domain, apart from the rules and end conditions, and can
find its own moves and learn from them by playing random playouts.
3. The MCTS can be saved in any intermediate state and that state can be used in future use
cases whenever required.
4. MCTS supports asymmetric expansion of the search tree based on the circumstances in
which it is operating.
Disadvantages of Monte Carlo Tree Search:

1. As the tree growth becomes rapid after a few iterations, it requires a huge amount of
memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there
might be a single branch or path, that might lead to loss against the opposition when
implemented for those turn-based games. This is mainly due to the vast amount of

Downloaded from EnggTree.com


EnggTree.com

combinations and each of the nodes might not be visited enough number of times to
understand its result or outcome in the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most
efficient path. So, there is a bit of a speed issue there.

5. Stochastic games:
Many unforeseeable external occurrences can place us in unforeseen circumstances in real life.
Many games, such as dice tossing, have a random element to reflect this unpredictability.
These are known as stochastic games. Backgammon is a classic game that mixes skill and luck.
The legal moves are determined by rolling dice at the start of each player’s turn white, for
example, has rolled a 6–5 and has four alternative moves in the backgammon scenario shown
in the figure below.

This is a standard backgammon position. The object of the game is to get all of one’s pieces off
the board as quickly as possible. White moves in a clockwise direction toward 25, while Black
moves in a counterclockwise direction toward 0. Unless there are many opponent pieces, a
piece can advance to any position; if there is only one opponent, it is caught and must start
over. White has rolled a 6–5 and must pick between four valid moves: (5–10,5–11), (5–11,19–
24), (5–10,10–16), and (5–11,11–16), where the notation (5–11,11–16) denotes moving one
piece from position 5 to 11 and then another from 11 to 16.
Stochastic game tree for a backgammon position
White knows his or her own legal moves, but he or she has no idea how Black will roll, and
thus has no idea what Black’s legal moves will be. That means White won’t be able to build a
normal game tree-like in chess or tic-tac-toe. In backgammon, in addition to M A X and M I N
nodes, a game tree must include chance nodes. The figure below depicts chance nodes as
circles. The possible dice rolls are indicated by the branches leading from each chance node;
each branch is labelled with the roll and its probability. There are 36 different ways to roll two
dice, each equally likely, yet there are only 21 distinct rolls because a 6–5 is the same as a 5–6.

Downloaded from EnggTree.com


EnggTree.com

P (1–1) = 1/36 because each of the six doubles (1–1 through 6–6) has a probability of 1/36.
Each of the other 15 rolls has a 1/18 chance of happening.

The following phase is to learn how to make good decisions. Obviously, we want to choose the
move that will put us in the best position. Positions, on the other hand, do not have specific
minimum and maximum values. Instead, we can only compute a position’s anticipated value,
which is the average of all potential outcomes of the chance nodes.
As a result, we can generalize the deterministic minimax value to an expected-minimax value
for games with chance nodes. Terminal nodes, MAX and MIN nodes (for which the dice roll is
known), and MAX and MIN nodes (for which the dice roll is unknown) all function as before.
We compute the expected value for chance nodes, which is the sum of all outcomes, weighted
by the probability of each chance action.

6. Partially observable games:


A partially observable system is one in which the entire state of the system is not fully visible
to an external sensor. In a partially observable system the observer may utilise a memory
system in order to add information to the observer's understanding of the system.
An example of a partially observable system would be a card game in which some of the cards
are discarded into a pile face down. In this case the observer is only able to view their own
cards and potentially those of the dealer. They are not able to view the face-down (used) cards,

Downloaded from EnggTree.com


EnggTree.com

nor the cards that will be dealt at some stage in the future. A memory system can be used to
remember the previously dealt cards that are now on the used pile. This adds to the total sum of
knowledge that the observer can use to make decisions.
In contrast, a fully observable system would be that of chess. In chess (apart from the 'who is
moving next' state, and minor subtleties such as whether a side has castled, which may not be
clear) the full state of the system is observable at any point in time.
Partially observable is a term used in a variety of mathematical settings, including that of
artificial intelligence and partially observable Markov decision processes.

7. Constraint satisfaction problems:


We have seen so many techniques like Local search, Adversarial search to solve different
problems. The objective of every problem-solving technique is one, i.e., to find a solution to
reach the goal. Although, in adversarial search and local search, there were no constraints on
the agents while solving the problems and reaching to its solutions.

Constraint satisfaction technique. By the name, it is understood that constraint satisfaction


means solving a problem under certain constraints or rules.
Constraint satisfaction is a technique where a problem is solved when its values satisfy certain
constraints or rules of the problem. Such type of technique leads to a deeper understanding of
the problem structure as well as its complexity.
Constraint satisfaction depends on three components, namely:
X: It is a set of variables.
D: It is a set of domains where the variables reside. There is a specific domain for each
variable.
C: It is a set of constraints which are followed by the set of variables.
In constraint satisfaction, domains are the spaces where the variables reside, following the
problem specific constraints. These are the three main elements of a constraint satisfaction
technique. The constraint value consists of a pair of {scope, rel}. The scope is a tuple of
variables which participate in the constraint and rel is a relation which includes a list of values
which the variables can take to satisfy the constraints of the problem.
Solving Constraint Satisfaction Problems
The requirements to solve a constraint satisfaction problem (CSP) is:

 A state-space
 The notion of the solution.

A state in state-space is defined by assigning values to some or all variables such as


{X1=v1, X2=v2, and so on…}.
An assignment of values to a variable can be done in three ways:

Downloaded from EnggTree.com


EnggTree.com

 Consistent or Legal Assignment: An assignment which does not violate any constraint
or rule is called Consistent or legal assignment.
 Complete Assignment: An assignment where every variable is assigned with a value,
and the solution to the CSP remains consistent. Such assignment is known as Complete
assignment.
 Partial Assignment: An assignment which assigns values to some of the variables only.
Such type of assignments are called Partial assignments.

Types of Domains in CSP


There are following two types of domains which are used by the variables :

 Discrete Domain: It is an infinite domain which can have one state for multiple
 variables. For example, a start state can be allocated infinite times for each variable.
 Finite Domain: It is a finite domain which can have continuous states describing one
domain for one specific variable. It is also called a continuous domain.

Constraint Types in CSP


With respect to the variables, basically there are following types of constraints:

 Unary Constraints: It is the simplest type of constraints that restricts the value of a
single variable.
 Binary Constraints: It is the constraint type which relates two variables. A value x2 will
contain a value which lies between x1 and x3.
 Global Constraints: It is the constraint type which involves an arbitrary number of
variables.
 Some special types of solution algorithms are used to solve the following types of
constraints:
 Linear Constraints: These type of constraints are commonly used in linear
programming where each variable containing an integer value exists in linear form
only.
 Non-linear Constraints: These type of constraints are used in non-linear programming
where each variable (an integer value) exists in a non-linear form.

Note: A special constraint which works in real-world is known as Preference constraint.

Constraint Propagation

In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP, we have
two choices either:

 We can search for a solution or


 We can perform a special type of inference called constraint propagation.

Downloaded from EnggTree.com


EnggTree.com

Constraint propagation is a special type of inference which helps in reducing the legal number
of values for the variables. The idea behind constraint propagation is local consistency.
In local consistency, variables are treated as nodes, and each binary constraint is treated as
an arc in the given problem. There are following local consistencies which are discussed
below:

 Node Consistency: A single variable is said to be node consistent if all the values in the

variable’s domain satisfy the unary constraints on the variables.

 Arc Consistency: A variable is arc consistent if every value in its domain satisfies the
binary constraints of the variables.
 Path Consistency: When the evaluation of a set of two variable with respect to a third
variable can be extended over another variable, satisfying all the binary constraints. It is
similar to arc consistency.
 k-consistency: This type of consistency is used to define the notion of stronger forms of
propagation. Here, we examine the k-consistency of the variables.

CSP Problems
Constraint satisfaction includes those problems which contains some constraints while solving
the problem. CSP includes the following problems:

 Graph Coloring: The problem where the constraint is that no adjacent sides can have
the same color.

 Sudoku Playing: The gameplay where the constraint is that no number from 0-9 can be
repeated in the same row or column.

Downloaded from EnggTree.com


EnggTree.com

 n-queen problem: In n-queen problem, the constraint is that no queen should be placed
either diagonally, in the same row or column.

Note: The n-queen problem is already discussed in Problem-solving in AI section.

 Crossword: In crossword problem, the constraint is that there should be the correct
formation of the words, and it should be meaningful.

Downloaded from EnggTree.com


EnggTree.com

 Latin square Problem: In this game, the task is to search the pattern which is occurring
several times in the game. They may be shuffled but will contain the same digits.

 Cryptarithmetic Problem: This problem has one most important constraint that is, we
cannot assign a different digit to the same character. All digits should contain a unique
alphabet.

Downloaded from EnggTree.com


EnggTree.com

Cryptarithmetic Problem

Cryptarithmetic Problem is a type of constraint satisfaction problem where the game is about
digits and its unique replacement either with alphabets or other symbols. In cryptarithmetic
problem, the digits (0-9) get substituted by some possible alphabets or symbols. The task in
cryptarithmetic problem is to substitute each digit with an alphabet to get the result
arithmetically correct.

We can perform all the arithmetic operations on a given cryptarithmetic problem.


The rules or constraints on a cryptarithmetic problem are as follows:

 There should be a unique digit to be replaced with a unique alphabet.


 The result should satisfy the predefined arithmetic rules, i.e., 2+2 =4, nothing else.
 Digits should be from 0-9 only.
 There should be only one carry forward, while performing the addition operation on a
problem.
 The problem can be solved from both sides, i.e., lefthand side (L.H.S), or righthand
side (R.H.S)

Let’s understand the cryptarithmetic problem as well its constraints better with the help
of an example:

 Given a cryptarithmetic problem, i.e., S E N D + M O R E = M O N E Y

 In this example, add both terms S E N D and M O R E to bring M O N E Y as a result.

Follow the below steps to understand the given problem by breaking it into its subparts:

 Starting from the left hand side (L.H.S) , the terms are S and M. Assign a digit which
could give a satisfactory result. Let’s assign S->9 and M->1.

Downloaded from EnggTree.com


EnggTree.com

Hence, we get a satisfactory result by adding up the terms and got an assignment for O as O-
>0 as well.
 Now, move ahead to the next terms E and O to get N as its output.

Adding E and O, which means 5+0=0, which is not possible because according to
cryptarithmetic constraints, we cannot assign the same digit to two letters. So, we need to think
more and assign some other value.

Note: When we will solve further, we will get one carry, so after applying it, the answer will be
satisfied.

 Further, adding the next two terms N and R we get,

Downloaded from EnggTree.com


EnggTree.com

But, we have already assigned E->5. Thus, the above result does not satisfy the values
because we are getting a different value for E. So, we need to think more.
Again, after solving the whole problem, we will get a carryover on this term, so our answer
will be satisfied.

where 1 will be carry forward to the above term


Let’s move ahead.

 Again, on adding the last two terms, i.e., the rightmost terms D and E, we get Y as its
result.

where 1 will be carry forward to the above term


 Keeping all the constraints in mind, the final resultant is as follows:

 Below is the representation of the assignment of the digits to the alphabets.

Downloaded from EnggTree.com


EnggTree.com

More examples of cryptarithmatic problems can be:

Downloaded from EnggTree.com


EnggTree.com

8. Backtracking search for CSP:

Backtracking search, a form of depth-first search, is commonly used for solving CSPs. Inference
can be interwoven with search.
Commutativity: CSPs are all commutative. A problem is commutative if the order of
application of any given set of actions has no effect on the outcome.
Backtracking search: A depth-first search that chooses values for one variable at a time and
backtracks when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all values in
the domain of that variable in turn, trying to find a solution. If an inconsistency is detected, then
BACKTRACK returns failure, causing the previous call to try another value.
There is no need to supply BACKTRACKING-SEARCH with a domain-specific initial state,
action function, transition model, or goal test.
BACKTRACKING-SARCH keeps only a single representation of a state and alters that
representation rather than creating a new ones.

To solve CSPs efficiently without domain-specific knowledge, address following questions:


1)function SELECT-UNASSIGNED-VARIABLE: which variable should be assigned next?

Downloaded from EnggTree.com


EnggTree.com

function ORDER-DOMAIN-VALUES: in what order should its values be tried?


2)function INFERENCE: what inferences should be performed at each step in the search?
3)When the search arrives at an assignment that violates a constraint, can the search avoid
repeating this failure?

1. Variable and value ordering

SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the variable with the
fewest “legal” value. A.k.a. “most constrained variable” or “fail-first” heuristic, it picks a
variable that is most likely to cause a failure soon thereby pruning the search tree. If some
variable X has no legal values left, the MRV heuristic will select X and failure will be detected
immediately—avoiding pointless searches through other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible value for SA, so
it makes sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic: The degree heuristic attempts to reduce the branching factor on future choices
by selecting the variable that is involved in the largest number of constraints on other unassigned
variables. [useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T has degree
0.

ORDER-DOMAIN-VALUES
Value selection—fail-last
If we are trying to find all the solution to a problem (not just the first one), then the ordering does
not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice for the
neighboring variables in the constraint graph. (Try to leave the maximum flexibility for
subsequent variable assignments.)

Downloaded from EnggTree.com


EnggTree.com

e.g. We have generated the partial assignment with WA=red and NT=green and that our next
choice is for Q. Blue would be a bad choice because it eliminates the last legal value left for Q’s
neighbor, SA, therefore prefers red to blue.

2. Interleaving search and inference

INFERENCE
forward checking: [One of the simplest forms of inference.] Whenever a variable X is assigned,
the forward-checking process establishes arc consistency for it: for each unassigned variable Y
that is connected to X by a constraint, delete from Y’s domain any value that is inconsistent with
the value chosen for X.
There is no reason to do forward checking if we have already done arc consistency as a
preprocessing step.

Advantage: For many problems the search will be more effective if we combine the MRV
heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but doesn’t look
ahead and make all the other variables arc-consistent.
MAC (Maintaining Arc Consistency) algorithm: [More powerful than forward checking,
detect this inconsistency.] After a variable Xi is assigned a value, the INFERENCE procedure
calls AC-3, but instead of a queue of all arcs in the CSP, we start with only the arcs(Xj, Xi) for
all Xj that are unassigned variables that are neighbors of Xi. From there, AC-3 does constraint
propagation in the usual way, and if any variable has its domain reduced to the empty set, the call
to AC-3 fails and we know to backtrack immediately.

Intelligent backtracking

Downloaded from EnggTree.com


EnggTree.com

chronological backtracking: The BACKGRACKING-SEARCH in Fig 6.5. When a branch of the


search fails, back up to the preceding variable and try a different value for it. (The most recent
decision point is revisited.)
e.g.
Suppose we have generated the partial assignment {Q=red, NSW=green, V=blue, T=red}.
When we try the next variable SA, we see every value violates a constraint.
We back up to T and try a new color, it cannot resolve the problem.
Intelligent backtracking: Backtrack to a variable that was responsible for making one of the
possible values of the next variable (e.g. SA) impossible.
Conflict set for a variable: A set of assignments that are in conflict with some value for that
variable.
(e.g. The set {Q=red, NSW=green, V=blue} is the conflict set for SA.)
backjumping method: Backtracks to the most recent assignment in the conflict set.
(e.g. backjumping would jump over T and try a new value for V.)

Forward checking can supply the conflict set with no extra work.
Whenever forward checking based on an assignment X=x deletes a value from Y’s domain, add
X=x to Y’s conflict set;
If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are added to
the conflict set of X.
In fact,every branch pruned by backjumping is also pruned by forward checking. Hence simple
backjumping is redundant in a forward-checking search or in a search that uses stronger
consistency checking (such as MAC).

Downloaded from EnggTree.com


EnggTree.com

Conflict-directed backjumping:
e.g.
consider the partial assignment which is proved to be inconsistent: {WA=red, NSW=red}.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these last 4
variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work because NT
doesn’t have a complete conflict set of preceding variables that caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together with any
subsequent variables to have no consistent solution. So the algorithm should backtrack to NSW
and skip over T.
A backjumping algorithm that uses conflict sets defined in this way is called conflict-direct
backjumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable has a
standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value for Xj fails,
backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪conf(Xj) – {Xi}.
The conflict set for an variable means, there is no solution from that variable onward, given the
preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT, NSW}.
Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA, NSW}.
Hence the algorithm backjump to NSW. (over T)

After backjumping from a contradiction, how to avoid running into the same problem again:
Constraint learning: The idea of finding a minimum set of variables from the conflict set that
causes the problem. This set of variables, along with their corresponding values, is called a no-
good. We then record the no-good, either by adding a new constraint to the CSP or by keeping a
separate cache of no-goods.

Backtracking occurs when no legal assignment can be found for a variable. Conflict-directed
backjumping backtracks directly to the source of the problem.

9. Local search for CSP:

Local search algorithms for CSPs use a complete-state formulation: the initial state assigns a
value to every variable, and the search change the value of one variable at a time.
The min-conflicts heuristic: In choosing a new value for a variable, select the value that results
in the minimum number of conflicts with other variables.

Downloaded from EnggTree.com


EnggTree.com

Local search techniques in Section 4.1 can be used in local search for CSPs.
The landscape of a CSP under the mini-conflicts heuristic usually has a series of plateau.
Simulated annealing and Plateau search (i.e. allowing sideways moves to another state with the
same score) can help local search find its way off the plateau. This wandering on the plateau can
be directed with tabu search: keeping a small list of recently visited states and forbidding the
algorithm to return to those tates.

Constraint weighting: a technique that can help concentrate the search on the important
constraints.
Each constraint is given a numeric weight Wi, initially all 1.
At each step, the algorithm chooses a variable/value pair to change that will result in the lowest
total weight of all violated constraints.
The weights are then adjusted by incrementing the weight of each constraint that is violated by
the current assignment.

Local search can be used in an online setting when the problem changes, this is particularly
important in scheduling problems.

The structure of problem

Downloaded from EnggTree.com


EnggTree.com

1. The structure of constraint graph


The structure of the problem as represented by the constraint graph can be used to find solution
quickly.
e.g. The problem can be decomposed into 2 independent subproblems: Coloring T and coloring
the mainland.

Tree: A constraint graph is a tree when any two varyiable are connected by only one path.
Directed arc consistency (DAC): A CSP is defined to be directed arc-consistent under an
ordering of variables X1, X2, … , Xn if and only if every Xi is arc-consistent with each Xj for j>i.
By using DAC, any tree-structured CSP can be solved in time linear in the number of variables.
How to solve a tree-structure CSP:
Pick any variable to be the root of the tree;
Choose an ordering of the variable such that each variable appears after its parent in the tree.
(topological sort)
Any tree with n nodes has n-1 arcs, so we can make this graph directed arc-consistent in O(n)
steps, each of which must compare up to d possible domain values for 2 variables, for a total
time of O(nd2).
Once we have a directed arc-consistent graph, we can just march down the list of variables and
choose any remaining value.
Since each link from a parent to its child is arc consistent, we won’t have to backtrack, and can
move linearly through the variables.

Downloaded from EnggTree.com


EnggTree.com

There are 2 primary ways to reduce more general constraint graphs to trees:
1. Based on removing nodes;

Downloaded from EnggTree.com


EnggTree.com

e.g. We can delete SA from the graph by fixing a value for SA and deleting from the domains of
other variables any values that are inconsistent with the value chosen for SA.
The general algorithm:
Choose a subset S of the CSP’s variables such that the constraint graph becomes a tree after
removal of S. S is called a cycle cutset.
For each possible assignment to the variables in S that satisfies all constraints on S,
(a) remove from the domain of the remaining variables any values that are inconsistent with the
assignment for S, and
(b) If the remaining CSP has a solution, return it together with the assignment for S.
Time complexity: O(dc·(n-c)d2), c is the size of the cycle cut set.
Cutset conditioning: The overall algorithmic approach of efficient approximation algorithms to
find the smallest cycle cutset.
2. Based on collapsing nodes together
Tree decomposition: construct a tree decomposition of the constraint graph into a set of
connected subproblems, each subproblem is solved independently, and the resulting solutions are
then combined.

A tree decomposition must satisfy 3 requirements:


·Every variable in the original problem appears in at least one of the subproblems.
·If 2 variables are connected by a constraint in the original problem, they must appear together
(along with the constraint) in at least one of the subproblems.
·If a variable appears in 2 subproblems in the tree, it must appear in every subproblem along the
path connecting those those subproblems.

We solve each subproblem independently.


If any one has no solution, the entire problem has no solution.
If we can solve all the subproblems, then construct a global solution as follows:
First, view each subproblem as a “mega-variable” whose domain is the set of all solutions for the
subproblem.
Then, solve the constraints connecting the subproblems using the efficient algorithm for trees.

Downloaded from EnggTree.com


EnggTree.com

A given constraint graph admits many tree decomposition;


In choosing a decomposition, the aim is to make the subproblems as small as possible.
Tree width:
The tree width of a tree decomposition of a graph is one less than the size of the largest
subproblems.
The tree width of the graph itself is the minimum tree width among all its tree decompositions.
Time complexity: O(ndw+1), w is the tree width of the graph.

2. The structure in the values of variables


By introducing a symmetry-breaking constraint, we can break the value symmetry and reduce
the search space by a factor of n!.
e.g.
Consider the map-coloring problems with n colors, for every consistent solution, there is actually
a set of n! solutions formed by permuting the color names.(value symmetry)
On the Australia map, WA, NT and SA must all have different colors, so there are 3!=6 ways to
assign.
We can impose an arbitrary ordering constraint NT<SA<WA that requires the 3 values to be in
alphabetical order. This constraint ensures that only one of the n! solution is possible: {NT=blue,
SA=green, WA=red}. (symmetry-breaking constraint)

Downloaded from EnggTree.com


EnggTree.com

AL3391 ARTIFICIAL INTELLIGENCE

UNIT IV

1. Knowledge-based agent in Artificial intelligence

o An intelligent agent needs knowledge about the real world for taking decisions and reasoning to act
efficiently.
o Knowledge-based agents are those agents who have the capability of maintaining an internal state of
knowledge, reason over that knowledge, update their knowledge after observations and take actions.
These agents can represent the world with some formal representation and act intelligently.
o Knowledge-based agents are composed of two main parts:
o Knowledge-base and
o Inference system.

A knowledge-based agent must able to do the following:

o An agent should be able to represent states, actions, etc.


o An agent Should be able to incorporate new percepts
o An agent can update the internal representation of the world
o An agent can deduce the internal representation of the world
o An agent can deduce appropriate actions.

The architecture of knowledge-based agent:

The above diagram is representing a generalized architecture for a knowledge-based agent. The knowledge-based
agent (KBA) take input from the environment by perceiving the environment. The input is taken by the inference
engine of the agent and which also communicate with KB to decide as per the knowledge store in KB. The learning
element of KBA regularly updates the KB by learning new knowledge.

Knowledge base: Knowledge-base is a central component of a knowledge-based agent, it is also known as KB. It is a
collection of sentences (here 'sentence' is a technical term and it is not identical to sentence in English). These
sentences are expressed in a language which is called a knowledge representation language. The Knowledge-base of
KBA stores fact about the world.

Downloaded from EnggTree.com


EnggTree.com

Why use a knowledge base?

Knowledge-base is required for updating knowledge for an agent to learn with experiences and take action as per the
knowledge.

Inference system

Inference means deriving new sentences from old. Inference system allows us to add a new sentence to the
knowledge base. A sentence is a proposition about the world. Inference system applies logical rules to the KB to
deduce new information.

Inference system generates new facts so that an agent can update the KB. An inference system works mainly in two
rules which are given as:

o Forward chaining
o Backward chaining

Operations Performed by KBA

Following are three operations which are performed by KBA in order to show the intelligent behavior:

1. TELL: This operation tells the knowledge base what it perceives from the environment.
2. ASK: This operation asks the knowledge base what action it should perform.
3. Perform: It performs the selected action.

A generic knowledge-based agent:

Following is the structure outline of a generic knowledge-based agents program:

function KB-AGENT(percept):
persistent: KB, a knowledge base
t, a counter, initially 0, indicating time
TELL(KB, MAKE-PERCEPT-SENTENCE(percept, t))
Action = ASK(KB, MAKE-ACTION-QUERY(t))
TELL(KB, MAKE-ACTION-SENTENCE(action, t))
t=t+1
return action

The knowledge-based agent takes percept as input and returns an action as output. The agent maintains the
knowledge base, KB, and it initially has some background knowledge of the real world. It also has a counter to
indicate the time for the whole process, and this counter is initialized with zero.

Each time when the function is called, it performs its three operations:

o Firstly it TELLs the KB what it perceives.


o Secondly, it asks KB what action it should take
o Third agent program TELLS the KB that which action was chosen.

Downloaded from EnggTree.com


EnggTree.com

The MAKE-PERCEPT-SENTENCE generates a sentence as setting that the agent perceived the given percept at the
given time.

The MAKE-ACTION-QUERY generates a sentence to ask which action should be done at the current time.

MAKE-ACTION-SENTENCE generates a sentence which asserts that the chosen action was executed.

Various levels of knowledge-based agent:

A knowledge-based agent can be viewed at different levels which are given below:

1. Knowledge level

Knowledge level is the first level of knowledge-based agent, and in this level, we need to specify what the agent
knows, and what the agent goals are. With these specifications, we can fix its behavior. For example, suppose an
automated taxi agent needs to go from a station A to station B, and he knows the way from A to B, so this comes at
the knowledge level.

2. Logical level:

are encoded into different logics. At the logical level, an encoding of knowledge into logical sentences occurs. At the
logical level we can expect to the automated taxi agent to reach to the destination B.

3. Implementation level:

This is the physical representation of logic and knowledge. At the implementation level agent perform actions as per
logical and knowledge level. At this level, an automated taxi agent actually implement his knowledge and logic so that
he can reach to the destination.

Approaches to designing a knowledge-based agent:

There are mainly two approaches to build a knowledge-based agent:

1. 1. Declarative approach: We can create a knowledge-based agent by initializing with an empty knowledge
base and telling the agent all the sentences with which we want to start with. This approach is called
Declarative approach.
2. 2. Procedural approach: In the procedural approach, we directly encode desired behavior as a program
code. Which means we just need to write a program that already encodes the desired behavior or agent.

However, in the real world, a successful agent can be built by combining both declarative and procedural approaches,
and declarative knowledge can often be compiled into more efficient procedural code.

2. Propositional logic

Propositional logic (PL) is the simplest form of logic where all the statements are made by propositions. A proposition
is a declarative statement which is either true or false. It is a technique of knowledge representation in logical and
mathematical form.

Downloaded from EnggTree.com


EnggTree.com

Example:
a) It is Sunday.
b) The Sun rises from West (False proposition)
c) 3+3= 7(False proposition)
d) 5 is a prime number.

Following are some basic facts about propositional logic:

o Propositional logic is also called Boolean logic as it works on 0 and 1.


o In propositional logic, we use symbolic variables to represent the logic, and we can use any symbol for a
representing a proposition, such A, B, C, P, Q, R, etc.
o Propositions can be either true or false, but it cannot be both.
o Propositional logic consists of an object, relations or function, and logical connectives.
o These connectives are also called logical operators.
o The propositions and connectives are the basic elements of the propositional logic.
o Connectives can be said as a logical operator which connects two sentences.
o A proposition formula which is always true is called tautology, and it is also called a valid sentence.
o A proposition formula which is always false is called Contradiction.
o A proposition formula which has both true and false values is called
o Statements which are questions, commands, or opinions are not propositions such as "Where is Rohini",
"How are you", "What is your name", are not propositions.

Syntax of propositional logic:

The syntax of propositional logic defines the allowable sentences for the knowledge representation. There are two
types of Propositions:

a. Atomic Propositions
b. Compound propositions

o Atomic Proposition: Atomic propositions are the simple propositions. It consists of a single proposition
symbol. These are the sentences which must be either true or false.

Example:

a) 2+2 is 4, it is an atomic proposition as it is a true fact.


b) "The Sun is cold" is also a proposition as it is a false fact.

o Compound proposition: Compound propositions are constructed by combining simpler or atomic


propositions, using parenthesis and logical connectives.

Example:

a) "It is raining today, and street is wet."


b) "Ankit is a doctor, and his clinic is in Mumbai."

Downloaded from EnggTree.com


EnggTree.com

Logical Connectives:

Logical connectives are used to connect two simpler propositions or representing a sentence logically. We can create
compound propositions with the help of logical connectives. There are mainly five connectives, which are given as
follows:

1. Negation: A sentence such as ¬ P is called negation of P. A literal can be either Positive literal or negative
literal.
2. Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a conjunction.
Example: Rohan is intelligent and hardworking. It can be written as,
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.
3. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction, where P and Q are the
propositions.
Example: "Ritika is a doctor or Engineer",
Here P= Ritika is Doctor. Q= Ritika is Doctor, so we can write it as P ∨ Q.
4. Implication: A sentence such as P → Q, is called an implication. Implications are also known as if-then rules.
It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
5. Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If I am breathing, then I
am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.

Following is the summarized table for Propositional Logic Connectives:

Truth Table:

In propositional logic, we need to know the truth values of propositions in all possible scenarios. We can combine all
the possible combination with logical connectives, and the representation of these combinations in a tabular format is

Downloaded from EnggTree.com


EnggTree.com

called Truth table. Following are the truth table for all logical connectives :

Downloaded from EnggTree.com


EnggTree.com

Truth table with three propositions:

We can build a proposition composing three propositions P, Q, and R. This truth table is made-up of 8n Tuples as we
have taken three proposition symbols.

Precedence of connectives:

Just like arithmetic operators, there is a precedence order for propositional connectors or logical operators. This order
should be followed while evaluating a propositional problem. Following is the list of the precedence order for
operators:

Precedence Operators

First Precedence Parenthesis

Second Precedence Negation

Third Precedence Conjunction(AND)

Fourth Precedence Disjunction(OR)

Fifth Precedence Implication

Six Precedence Biconditional

Note: For better understanding use parenthesis to make sure of the correct interpretations. Such as ¬R∨ Q, It
can be interpreted as (¬R) ∨ Q

Logical equivalence:

Logical equivalence is one of the features of propositional logic. Two propositions are said to be logically equivalent if
and only if the columns in the truth table are identical to each other.

Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In below truth table we can
see that column for ¬A∨ B and A→B, are identical hence A is Equivalent to B

Downloaded from EnggTree.com


EnggTree.com

Properties of Operators:
o Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
o Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
o Identity element:
o P ∧ True = P,
o P ∨ True= True.
o Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
o DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
o Double-negation elimination:
o ¬ (¬P) = P.

Limitations of Propositional logic:


o We cannot represent relations like ALL, some, or none with propositional logic. Example:
a. All the girls are intelligent.
b. Some apples are sweet.
Propositional logic has limited expressive power.
In propositional logic, we cannot describe statements in terms of their properties or logical relationships.

3. Propositional theorem proving

Theorem proving: Applying rules of inference directly to the sentences in our knowledge base to
construct a proof of the desired sentence without consulting models.
Inference rules are patterns of sound inference that can be used to find proofs. The resolution rule yields
a complete inference algorithm for knowledge bases that are expressed in conjunctive normal
form. Forward chaining and backward chaining are very natural reasoning algorithms for knowledge
bases in Horn form.
Logical equivalence:
Two sentences α and β are logically equivalent if they are true in the same set of models. (write as α ≡ β).
Also: α ≡ β if and only if α ⊨ β and β ⊨ α.

Downloaded from EnggTree.com


EnggTree.com

Validity: A sentence is valid if it is true in all models.


Valid sentences are also known as tautologies—they are necessarily true. Every valid sentence is
logically equivalent to True.
The deduction theorem: For any sentence αand β, α ⊨ β if and only if the sentence (α ⇒ β) is valid.
Satisfiability: A sentence is satisfiable if it is true in, or satisfied by, some model. Satisfiability can be
checked by enumerating the possible models until one is found that satisfies the sentence.
The SAT problem: The problem of determining the satisfiability of sentences in propositional logic.
Validity and satisfiability are connected:
α is valid iff ¬α is unsatisfiable;
α is satisfiable iff ¬α is not valid;
α ⊨ β if and only if the sentence (α∧¬β) is unsatisfiable.
Proving β from α by checking the unsatisfiability of (α∧¬β) corresponds to proof by refutation / proof
by contradiction.

Inference and proofs


Inferences rules (such as Modus Ponens and And-Elimination) can be applied to derived to a proof.
·Modus Ponens:

Whenever any sentences of the form α⇒β and α are given, then the sentence β can be inferred.

·And-Elimination:

From a conjunction, any of the conjuncts can be inferred.

·All of logical equivalence (in Figure 7.11) can be used as inference rules.

Downloaded from EnggTree.com


EnggTree.com

e.g. The equivalence for biconditional elimination yields 2 inference rules:

·De Morgan’s rule

We can apply any of the search algorithms in Chapter 3 to find a sequence of steps that constitutes a
proof. We just need to define a proof problem as follows:
·INITIAL STATE: the initial knowledge base;
·ACTION: the set of actions consists of all the inference rules applied to all the sentences that match the
top half of the inference rule.
·RESULT: the result of an action is to add the sentence in the bottom half of the inference rule.
·GOAL: the goal is a state that contains the sentence we are trying to prove.
In many practical cases, finding a proof can be more efficient than enumerating models, because the
proof can ignore irrelevant propositions, no matter how many of them they are.

Monotonicity: A property of logical system, says that the set of entailed sentences can only increased as
information is added to the knowledge base.
For any sentences α and β,
If KB ⊨ αthen KB ∧β ⊨ α.
Monotonicity means that inference rules can be applied whenever suitable premises are found in the
knowledge base, what else in the knowledge base cannot invalidate any conclusion already inferred.
Proof by resolution
Resolution: An inference rule that yields a complete inference algorithm when coupled with any complete
search algorithm.
Clause: A disjunction of literals. (e.g. A∨ B). A single literal can be viewed as a unit clause (a disjunction of
one literal ).
Unit resolution inference rule: Takes a clause and a literal and produces a new clause.

where each l is a literal, li and m are complementary literals (one is the negation of the other).

Full resolution rule: Takes 2 clauses and produces a new clause.

Downloaded from EnggTree.com


EnggTree.com

where li and mj are complementary literals.

Notice: The resulting clause should contain only one copy of each literal. The removal of multiple copies of literal is
called factoring.
e.g. resolve(A∨ B) with (A∨ ¬B), obtain(A∨ A) and reduce it to just A.

The resolution rule is sound and complete.

Conjunctive normal form


Conjunctive normal form (CNF): A sentence expressed as a conjunction of clauses is said to be in CNF.
Every sentence of propositional logic is logically equivalent to a conjunction of clauses, after converting a sentence
into CNF, it can be used as input to a resolution procedure.

A resolution algorithm

e.g.
KB = (B1,1⟺(P1,2∨P2,1))∧¬B1,1
α = ¬P1,2

Downloaded from EnggTree.com


EnggTree.com

Notice: Any clause in which two complementary literals appear can be discarded, because it is always equivalent
to True.
e.g. B1,1∨¬B1,1∨P1,2 = True∨P1,2 = True.
PL-RESOLUTION is complete.

Horn clauses and definite clauses

Definite clause: A disjunction of literals of which exactly one is positive. (e.g. ¬ L1,1∨¬Breeze∨B1,1)
Every definite clause can be written as an implication, whose premise is a conjunction of positive literals and whose
conclusion is a single positive literal.
Horn clause: A disjunction of literals of which at most one is positive. (All definite clauses are Horn clauses.)
In Horn form, the premise is called the body and the conclusion is called the head.
A sentence consisting of a single positive literal is called a fact, it too can be written in implication form.
Horn clause are closed under resolution: if you resolve 2 horn clauses, you get back a horn clause.
Inference with horn clauses can be done through the forward-chaining and backward-chaining algorithms.
Deciding entailment with Horn clauses can be done in time that is linear in the size of the knowledge base.
Goal clause: A clause with no positive literals.

Forward and backward chaining


forward-chaining algorithm: PL-FC-ENTAILS?(KB, q) (runs in linear time)
Forward chaining is sound and complete.

Downloaded from EnggTree.com


EnggTree.com

e.g. A knowledge base of horn clauses with A and B as known facts.

Fixed point: The algorithm reaches a fixed point where no new inferences are possible.
Data-driven reasoning: Reasoning in which the focus of attention starts with the known data. It can be used within
an agent to derive conclusions from incoming percept, often without a specific query in mind. (forward chaining is an
example)

Backward-chaining algorithm: works backward rom the query.


If the query q is known to be true, no work is needed;
Otherwise the algorithm finds those implications in the KB whose conclusion is q. If all the premises of one of those
implications can be proved true (by backward chaining), then q is true. (runs in linear time)
in the corresponding AND-OR graph: it works back down the graph until it reaches a set of known facts.
(Backward-chaining algorithm is essentially identical to the AND-OR-GRAPH-SEARCH algorithm.)
Backward-chaining is a form of goal-directed reasoning.

Downloaded from EnggTree.com


EnggTree.com

4. Propositional model checking

The set of possible models, given a fixed propositional vocabulary, is finite, so entailment can be checked by
enumerating models. Efficient model-checking inference algorithms for propositional logic include backtracking and
local search methods and can often solve large problems quickly.
2 families of algorithms for the SAT problem based on model checking:
a. based on backtracking
b. based on local hill-climbing search

1. A complete backtracking algorithm


David-Putnam algorithm (DPLL):

DPLL embodies 3 improvements over the scheme of TT-ENTAILS?: Early termination, pure symbol heuristic, unit
clause heuristic.
Tricks that enable SAT solvers to scale up to large problems: Component analysis, variable and value
ordering, intelligent backtracking, random restarts, clever indexing.

Local search algorithms


Local search algorithms can be applied directly to the SAT problem, provided that choose the right evaluation
function. (We can choose an evaluation function that counts the number of unsatisfied clauses.)
These algorithms take steps in the space of complete assignments, flipping the truth value of one symbol at a time.
The space usually contains many local minima, to escape from which various forms of randomness are required.
Local search methods such as WALKSAT can be used to find solutions. Such algorithm are sound but not complete.
WALKSAT: one of the simplest and most effective algorithms.

Downloaded from EnggTree.com


EnggTree.com

On every iteration, the algorithm picks an unsatisfied clause, and chooses randomly between 2 ways to pick a symbol
to flip:
Either a. a “min-conflicts” step that minimizes the number of unsatisfied clauses in the new state;
Or b. a “random walk” step that picks the symbol randomly.
When the algorithm returns a model, the input sentence is indeed satifiable;
When the algorithm returns failure, there are 2 possible causes:
Either a. The sentence is unsatisfiable;
Or b. We need to give the algorithm more time.
If we set max_flips=∞, p>0, the algorithm will:
Either a. eventually return a model if one exists
Or b. never terminate if the sentence is unsatifiable.
Thus WALKSAT is useful when we expect a solution to exist, but cannot always detect unsatisfiability.

The landscape of random SAT problems


Underconstrained problem: When we look at satisfiability problems in CNF, an underconstrained problem is one
with relatively few clauses constraining the variables.
An overconstrained problem has many clauses relative to the number of variables and is likely to have no
solutions.

The notation CNFk(m, n) denotes a k-CNF sentence with m clauses and n symbols. (with n variables and k literals per
clause).
Given a source of random sentences, where the clauses are chosen uniformly, independently and without
replacement from among all clauses with k different literals, which are positive or negative at random.
Hardness: problems right at the threshold > overconstrained problems > underconstrained problems

Downloaded from EnggTree.com


EnggTree.com

Satifiability threshold conjecture: A theory says that for every k≥3, there is a threshold ratio rk, such that as n goes
to infinity, the probability that CNFk(n, rn) is satisfiable becomes 1 for all values or r below the threshold, and 0 for all
values above. (remains unproven)

5. Agents based on propositional logic


1. The current state of the world
We can associate proposition with timestamp to avoid contradiction.
e.g. ¬Stench3, Stench4
fluent: refer an aspect of the world that changes. (E.g. Ltx,y)
atemporal variables: Symbols associated with permanent aspects of the world do not need a time superscript.
Effect axioms: specify the outcome of an action at the next time step.

Frame problem: some information lost because the effect axioms fails to state what remains unchanged as the
result of an action.
Solution: add frame axioms explicity asserting all the propositions that remain the same.

Representation frame problem: The proliferation of frame axioms is inefficient, the set of frame axioms will be
O(mn) in a world with m different actions and n fluents.
Solution: because the world exhibits locaility (for humans each action typically changes no more than some number
k of those fluents.) Define the transition model with a set of axioms of size O(mk) rather than size O(mn).

Inferential frame problem: The problem of projecting forward the results of a t step lan of action in time O(kt) rather
than O(nt).
Solution: change one’s focus from writing axioms about actions to writing axioms about fluents.
For each fluent F, we will have an axiom that defines the truth value of F t+1 in terms of fluents at time t and the action
that may have occurred at time t.
The truth value of Ft+1 can be set in one of 2 ways:
Either a. The action at time t cause F to be true at t+1
Or b. F was already true at time t and the action at time t does not cause it to be false.
An axiom of this form is called a successor-state axiom and has this schema:

Qualification problem: specifying all unusual exceptions that could cause the action to fail.

2. A hybrid agent

Downloaded from EnggTree.com


EnggTree.com

Hybrid agent: combines the ability to deduce various aspect of the state of the world with condition-action rules, and
with problem-solving algorithms.
The agent maintains and update KB as a current plan.
The initial KB contains the atemporal axioms. (don’t depend on t)
At each time step, the new percept sentence is added along with all the axioms that depend on t (such as the
successor-state axioms).
Then the agent use logical inference by ASKING questions of the KB (to work out which squares are safe and which
have yet to be visited).

The main body of the agent program constructs a plan based on a decreasing priority of goals:
1. If there is a glitter, construct a plan to grab the gold, follow a route back to the initial location and climb out of the
cave;
2. Otherwise if there is no current plan, plan a route (with A* search) to the closest safe square unvisited yet, making
sure the route goes through only safe squares;
3. If there are no safe squares to explore, if still has an arrow, try to make a safe square by shooting at one of the
possible wumpus locations.
4. If this fails, look for a square to explore that is not provably unsafe.
5. If there is no such square, the mission is impossible, then retreat to the initial location and climb out of the cave.

Downloaded from EnggTree.com


EnggTree.com

Weakness: The computational expense goes up as time goes by.

3. Logical state estimation


To get a constant update time, we need to cache the result of inference.
Belief state: Some representation of the set of all possible current state of the world. (used to replace the past
history of percepts and all their ramifications)

We use a logical sentence involving the proposition symbols associated with the current time step and the temporal
symbols.
Logical state estimation involves maintaining a logical sentence that describes the set of possible states consistent
with the observation history. Each update step requires inference using the transition model of the environment,
which is built from successor-state axioms that specify how each fluent changes.
State estimation: The process of updating the belief state as new percepts arrive.

Exact state estimation may require logical formulas whose size is exponential in the number of symbols.
One common scheme for approximate state estimation: to represent belief state as conjunctions of literals (1-CNF
formulas).
The agent simply tries to prove Xt and ¬Xt for each symbol Xt, given the belief state at t-1.
The conjunction of provable literals becomes the new belief state, and the previous belief state is discarded.
(This scheme may lose some information as time goes along.)

The set of possible states represented by the 1-CNF belief state includes all states that are in fact possible given the
full percept history. The 1-CNF belief state acts as a simple outer envelope, or conservative approximation.

4. Making plans by propositional inference


We can make plans by logical inference instead of A* search in Figure 7.20.
Basic idea:
1. Construct a sentence that includes:
a) Init0: a collection of assertions about the initial state;
b) Transition1, …, Transitiont: The successor-state axioms for all possible actions at each time up to some maximum
time t;
c) HaveGoldt∧ClimbedOutt: The assertion that the goal is achieved at time t.
2. Present the whole sentence to a SAT solver. If the solver finds a satisfying model, the goal is achievable; else the
planning is impossible.
3. Assuming a model is found, extract from the model those variables that represent actions and are assigned true.

Downloaded from EnggTree.com


EnggTree.com

Together they represent a plan to ahieve the goals.


Decisions within a logical agent can be made by SAT solving: finding possible models specifying future action
sequences that reach the goal. This approach works only for fully observable or sensorless environment.
SATPLAN: A propositional planning. (Cannot be used in a partially observable environment)
SATPLAN finds models for a sentence containing the initial sate, the goal, the successor-state axioms, and the action
exclusion axioms.
(Because the agent does not know how many steps it will take to reach the goal, the algorithm tries each possible
number of steps t up to some maximum conceivable plan length Tmax.)

Precondition axioms: stating that an action occurrence requires the preconditions to be satisfied, added to avoid
generating plans with illegal actions.
Action exclusion axioms: added to avoid the creation of plans with multiple simultaneous actions that interfere with
each other.
Propositional logic does not scale to environments of unbounded size because it lacks the expressive power to deal
concisely with time, space and universal patterns of relationshipgs among objects.

6. First-order logic

First-Order Logic in Artificial intelligence

In the topic of Propositional logic, we have seen that how to represent statements using propositional logic. But
unfortunately, in propositional logic, we can only represent the facts, which are either true or false. PL is not sufficient
to represent the complex sentences or natural language statements. The propositional logic has very limited
expressive power. Consider the following sentence, which we cannot represent using PL logic.

o "Some humans are intelligent", or


o "Sachin likes cricket."

To represent the above statements, PL logic is not sufficient, so we required some more powerful logic, such as first-
order logic.

First-Order logic:
o First-order logic is another way of knowledge representation in artificial intelligence. It is an extension to
propositional logic.

Downloaded from EnggTree.com


EnggTree.com

o FOL is sufficiently expressive to represent the natural language statements in a concise way.
o First-order logic is also known as Predicate logic or First-order predicate logic. First-order logic is a
powerful language that develops information about the objects in a more easy way and can also express the
relationship between those objects.
o First-order logic (like natural language) does not only assume that the world contains facts like propositional
logic but also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation such as: the
sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
a. Syntax
b. Semantics

Syntax of First-Order logic:

The syntax of FOL determines which collection of symbols is a logical expression in first-order logic. The basic
syntactic elements of first-order logic are symbols. We write statements in short-hand notation in FOL.

Basic Elements of First-order logic:

Following are the basic elements of FOL syntax:

Constant 1, 2, A, John, Mumbai, cat,....

Variables x, y, z, a, b,....

Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃

Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences are formed from a
predicate symbol followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).

Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).


Chinky is a cat: => cat (Chinky).

Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.

First-order logic statements can be divided into two parts:

o Subject: Subject is the main part of the statement.


o Predicate: A predicate can be defined as a relation, which binds two atoms together in a statement.

Downloaded from EnggTree.com


EnggTree.com

Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of the statement and
second part "is an integer," is known as a predicate.

Quantifiers in First-order logic:


o A quantifier is a language element which generates quantification, and quantification specifies the quantity
of specimen in the universe of discourse.
o These are the symbols that permit to determine or identify the range and scope of the variable in the logical
expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).

Universal Quantifier:

Universal quantifier is a symbol of logical representation, which specifies that the statement within its range is true for
everything or every instance of a particular thing.

The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.

Note: In universal quantifier we use implication "→".

If x is a variable, then ∀x is read as:

o For all x
o For each x
o For every x.

Example:

All man drink coffee.

Let a variable x which refers to a cat so all x can be represented in UOD as below:

Downloaded from EnggTree.com


EnggTree.com

∀x man(x) → drink (x, coffee).

It will be read as: There are all x where x is a man who drink coffee.

Existential Quantifier:

Existential quantifiers are the type of quantifiers, which express that the statement within its scope is true for at least
one instance of something.

It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a predicate variable then it
is called as an existential quantifier.

Note: In Existential quantifier we always use AND or Conjunction symbol ( ∧).

If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:

o There exists a 'x.'


o For some 'x.'
o For at least one 'x.'

Example:

Some boys are intelligent.

Downloaded from EnggTree.com


EnggTree.com

∃x: boys(x) ∧ intelligent(x)

It will be read as: There are some x where x is a boy who is intelligent.

Points to remember:
o The main connective for universal quantifier ∀ is implication →.
o The main connective for existential quantifier ∃ is and ∧.

Properties of Quantifiers:
o In universal quantifier, ∀x∀y is similar to ∀y∀x.
o In Existential quantifier, ∃x∃y is similar to ∃y∃x.
o ∃x∀y is not similar to ∀y∃x.

Some Examples of FOL using quantifier:

1. All birds fly.


In this question the predicate is "fly(bird)."
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).

2. Every man respects his parent.


In this question, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as follows:
∀x man(x) → respects (x, parent).

3. Some boys play cricket.


In this question, the predicate is "play(x, y)," where x= boys, and y= game. Since there are some boys so we will
use ∃, and it will be represented as:
∃x boys(x) → play(x, cricket).

4. Not all students like both Mathematics and Science.


In this question, the predicate is "like(x, y)," where x= student, and y= subject.

Downloaded from EnggTree.com


EnggTree.com

Since there are not all students, so we will use ∀ with negation, so following representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].

5. Only one student failed in Mathematics.


In this question, the predicate is "failed(x, y)," where x= student, and y= subject.
Since there is only one student who failed in Mathematics, so we will use following representation for this:
∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧ student(y) → ¬failed (x, Mathematics)].

Free and Bound Variables:

The quantifiers interact with variables which appear in a suitable way. There are two types of variables in First-order
logic which are given below:

Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope of the quantifier.

Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.

Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the scope of the quantifier.

Example: ∀x [A (x) B( y)], here x and y are the bound variables.

7. Knowledge representation and engineering

Knowledge Engineering in First-order logic

What is knowledge-engineering?

The process of constructing a knowledge-base in first-order logic is called as knowledge- engineering. In knowledge-
engineering, someone who investigates a particular domain, learns important concept of that domain, and generates
a formal representation of the objects, is known as knowledge engineer.

In this topic, we will understand the Knowledge engineering process in an electronic circuit domain, which is already
familiar. This approach is mainly suitable for creating special-purpose knowledge base.

The knowledge-engineering process:

Following are some main steps of the knowledge-engineering process. Using these steps, we will develop a
knowledge base which will allow us to reason about digital circuit (One-bit full adder) which is given below

Downloaded from EnggTree.com


EnggTree.com

1. Identify the task:

The first step of the process is to identify the task, and for the digital circuit, there are various reasoning tasks

At the first level or highest level, we will examine the functionality of the circuit:

o Does the circuit add properly?


o What will be the output of gate A2, if all the inputs are high?

At the second level, we will examine the circuit structure details such as:

o Which gate is connected to the first input terminal?


o Does the circuit have feedback loops?

2. Assemble the relevant knowledge:

In the second step, we will assemble the relevant knowledge which is required for digital circuits. So for digital circuits,
we have the following required knowledge:

o Logic circuits are made up of wires and gates.


o Signal flows through wires to the input terminal of the gate, and each gate produces the corresponding
output which flows further.
o In this logic circuit, there are four types of gates used: AND, OR, XOR, and NOT.
o All these gates have one output terminal and two input terminals (except NOT gate, it has one input
terminal).

3. Decide on vocabulary:

The next step of the process is to select functions, predicate, and constants to represent the circuits, terminals, signals,
and gates. Firstly we will distinguish the gates from each other and from other objects. Each gate is represented as an
object which is named by a constant, such as, Gate(X1). The functionality of each gate is determined by its type,
which is taken as constants such as AND, OR, XOR, or NOT. Circuits will be identified by a predicate: Circuit (C1).

For the terminal, we will use predicate: Terminal(x).

For gate input, we will use the function In(1, X1) for denoting the first input terminal of the gate, and for output
terminal we will use Out (1, X1).

The function Arity(c, i, j) is used to denote that circuit c has i input, j output.

The connectivity between gates can be represented by predicate Connect(Out(1, X1), In(1, X1)).

We use a unary predicate On (t), which is true if the signal at a terminal is on.

4. Encode general knowledge about the domain:

To encode the general knowledge about the logic circuit, we need some following rules:

o If two terminals are connected then they have the same input signal, it can be represented as:

Downloaded from EnggTree.com


EnggTree.com

∀ t1, t2 Terminal (t1) ∧ Terminal (t2) ∧ Connect (t1, t2) → Signal (t1) = Signal (2).

o Signal at every terminal will have either value 0 or 1, it will be represented as:

∀ t Terminal (t) →Signal (t) = 1 ∨Signal (t) = 0.

o Connect predicates are commutative:

∀ t1, t2 Connect(t1, t2) → Connect (t2, t1).

o Representation of types of gates:

∀ g Gate(g) ∧ r = Type(g) → r = OR ∨r = AND ∨r = XOR ∨r = NOT.


o Output of AND gate will be zero if and only if any of its input is zero.

∀ g Gate(g) ∧ Type(g) = AND →Signal (Out(1, g))= 0 ⇔ ∃n Signal (In(n, g))= 0.


o Output of OR gate is 1 if and only if any of its input is 1:

∀ g Gate(g) ∧ Type(g) = OR → Signal (Out(1, g))= 1 ⇔ ∃n Signal (In(n, g))= 1


o Output of XOR gate is 1 if and only if its inputs are different:

∀ g Gate(g) ∧ Type(g) = XOR → Signal (Out(1, g)) = 1 ⇔ Signal (In(1, g)) ≠ Signal (In(2, g)).
o Output of NOT gate is invert of its input:

∀ g Gate(g) ∧ Type(g) = NOT → Signal (In(1, g)) ≠ Signal (Out(1, g)).


o All the gates in the above circuit have two inputs and one output (except NOT gate).

∀ g Gate(g) ∧ Type(g) = NOT → Arity(g, 1, 1)


∀ g Gate(g) ∧ r =Type(g) ∧ (r= AND ∨r= OR ∨r= XOR) → Arity (g, 2, 1).
o All gates are logic circuits:

∀ g Gate(g) → Circuit (g).

5. Encode a description of the problem instance:

Now we encode problem of circuit C1, firstly we categorize the circuit and its gate components. This step is easy if
ontology about the problem is already thought. This step involves the writing simple atomics sentences of instances
of concepts, which is known as ontology.

For the given circuit C1, we can encode the problem instance in atomic sentences as below:

Since in the circuit there are two XOR, two AND, and one OR gate so atomic sentences for these gates will be:

For XOR gate: Type(x1)= XOR, Type(X2) = XOR


For AND gate: Type(A1) = AND, Type(A2)= AND
For OR gate: Type (O1) = OR.

And then represent the connections between all the gates.

Downloaded from EnggTree.com


EnggTree.com

Note: Ontology defines a particular theory of the nature of existence.

6. Pose queries to the inference procedure and get answers:

In this step, we will find all the possible set of values of all the terminal for the adder circuit. The first query will be:

What should be the combination of input which would generate the first output of circuit C1, as 0 and a second
output to be 1?

∃ i1, i2, i3 Signal (In(1, C1))=i1 ∧ Signal (In(2, C1))=i2 ∧ Signal (In(3, C1))= i3
∧ Signal (Out(1, C1)) =0 ∧ Signal (Out(2, C1))=1

7. Debug the knowledge base:

Now we will debug the knowledge base, and this is the last step of the complete process. In this step, we will try to
debug the issues of knowledge base.

In the knowledge base, we may have omitted assertions like 1 ≠ 0.

8. Inferences in first-order logic

Inference in First-Order Logic

Inference in First-Order Logic is used to deduce new facts or sentences from existing sentences. Before understanding
the FOL inference rule, let's understand some basic terminologies used in FOL.

Substitution:

Substitution is a fundamental operation performed on terms and formulas. It occurs in all inference systems in first-
order logic. The substitution is complex in the presence of quantifiers in FOL. If we write F[a/x], so it refers to
substitute a constant "a" in place of variable "x".

Note: First-order logic is capable of expressing facts about some or all objects in the universe.

Equality:

First-Order logic does not only use predicate and terms for making atomic sentences but also uses another way,
which is equality in FOL. For this, we can use equality symbols which specify that the two terms refer to the same
object.

Example: Brother (John) = Smith.

As in the above example, the object referred by the Brother (John) is similar to the object referred by Smith. The
equality symbol can also be used with negation to represent that two terms are not the same objects.

Example: ¬(x=y) which is equivalent to x ≠y.

Downloaded from EnggTree.com


EnggTree.com

FOL inference rules for quantifier:

As propositional logic we also have inference rules in first-order logic, so following are some basic inference rules in
FOL:

o Universal Generalization
o Universal Instantiation
o Existential Instantiation
o Existential introduction

1. Universal Generalization:

o Universal generalization is a valid inference rule which states that if premise P(c) is true for any arbitrary
element c in the universe of discourse, then we can have a conclusion as ∀ x P(x).

o It can be represented as: .

o This rule can be used if we want to show that every element has a similar property.
o In this rule, x must not appear as a free variable.

Example: Let's represent, P(c): "A byte contains 8 bits", so for ∀ x P(x) "All bytes contain 8 bits.", it will also be true.

2. Universal Instantiation:

o Universal instantiation is also called as universal elimination or UI is a valid inference rule. It can be applied
multiple times to add new sentences.
o The new KB is logically equivalent to the previous KB.
o As per UI, we can infer any sentence obtained by substituting a ground term for the variable.
o The UI rule state that we can infer any sentence P(c) by substituting a ground term c (a constant within
domain x) from ∀ x P(x) for any object in the universe of discourse.

o It can be represented as: .


o Example:1.
o IF "Every person like ice-cream"=> ∀x P(x) so we can infer that
"John likes ice-cream" => P(c)
o Example: 2.
o Let's take a famous example,
o "All kings who are greedy are Evil." So let our knowledge base contains this detail as in the form of FOL:

∀x king(x) ∧ greedy (x) → Evil (x),

So from this information, we can infer any of the following statements using Universal Instantiation:

o King(John) ∧ Greedy (John) → Evil (John),


o King(Richard) ∧ Greedy (Richard) → Evil (Richard),
o King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),

3. Existential Instantiation:

Downloaded from EnggTree.com


EnggTree.com

Existential instantiation is also called as Existential Elimination, which is a valid inference rule in first-order
logic.

o It can be applied only once to replace the existential sentence.


o The new KB is not logically equivalent to old KB, but it will be satisfiable if old KB was satisfiable.
o This rule states that one can infer P(c) from the formula given in the form of ∃x P(x) for a new constant
symbol c.
o The restriction with this rule is that c used in the rule must be a new term for which P(c ) is true.

o It can be represented as:

Example:

From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),

So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the knowledge base.

o The above used K is a constant symbol, which is called Skolem constant.


o The Existential instantiation is a special case of Skolemization process.

4. Existential introduction

o An existential introduction is also known as an existential generalization, which is a valid inference rule in
first-order logic.
o This rule states that if there is some element c in the universe of discourse which has a property P, then we
can infer that there exists something in the universe which has the property P.

o It can be represented as:


o Example: Let's say that,
"Priyanka got good marks in English."
"Therefore, someone got good marks in English."

Generalized Modus Ponens Rule:

For the inference process in FOL, we have a single inference rule which is called Generalized Modus Ponens. It is lifted
version of Modus ponens.

Generalized Modus Ponens can be summarized as, " P implies Q and P is asserted to be true, therefore Q must be
True."

According to Modus Ponens, for atomic sentences pi, pi', q. Where there is a substitution θ such that SUBST (θ, pi',)
= SUBST(θ, pi), it can be represented as:

Example:

Downloaded from EnggTree.com


EnggTree.com

We will use this rule for Kings are evil, so we will find some x such that x is king, and x is greedy so we can
infer that x is evil.

Here let say, p1' is king(John) p1 is king(x)


p2' is Greedy(y) p2 is Greedy(x)
θ is {x/John, y/John} q is evil(x)
SUBST(θ,q).

9. Forward chaining and Backward chaining

Forward Chaining and backward chaining in AI

In artificial intelligence, forward and backward chaining is one of the important topics, but before understanding
forward and backward chaining lets first understand that from where these two terms came.

Inference engine:

The inference engine is the component of the intelligent system in artificial intelligence, which applies logical rules to
the knowledge base to infer new information from known facts. The first inference engine was part of the expert
system. Inference engine commonly proceeds in two modes, which are:

A. Forward chaining
B. Backward chaining

Horn Clause and Definite clause:

Horn clause and definite clause are the forms of sentences, which enables knowledge base to use a more restricted
and efficient inference algorithm. Logical inference algorithms use forward and backward chaining approaches, which
require KB in the form of the first-order definite clause.

Definite clause: A clause which is a disjunction of literals with exactly one positive literal is known as a definite
clause or strict horn clause.

Horn clause: A clause which is a disjunction of literals with at most one positive literal is known as horn clause.
Hence all the definite clauses are horn clauses.

Example: (¬ p V ¬ q V k). It has only one positive literal k.

It is equivalent to p ∧ q → k.

A. Forward Chaining

Forward chaining is also known as a forward deduction or forward reasoning method when using an inference engine.
Forward chaining is a form of reasoning which start with atomic sentences in the knowledge base and applies
inference rules (Modus Ponens) in the forward direction to extract more data until a goal is reached.

The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are satisfied, and add their
conclusion to the known facts. This process repeats until the problem is solved.

Properties of Forward-Chaining:

Downloaded from EnggTree.com


EnggTree.com

o It is a down-up approach, as it moves from bottom to top.


o It is a process of making a conclusion based on known facts or data, by starting from the initial state and
reaches the goal state.
o Forward-chaining approach is also called as data-driven as we reach to the goal using available data.
o Forward -chaining approach is commonly used in the expert system, such as CLIPS, business, and production
rule systems.

Consider the following famous example which we will use in both approaches:

Example:

"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."

Prove that "Robert is criminal."

To solve the above problem, first, we will convert all the above facts into first-order definite clauses, and then we will
use a forward-chaining algorithm to reach the goal.

Facts Conversion into FOL:


o It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
o Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can be written in two definite clauses by using
Existential Instantiation, introducing new Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
o All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
o Missiles are weapons.
Missile(p) → Weapons (p) .......(5)
o Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) ........(6)
o Country A is an enemy of America.
Enemy (A, America) .........(7)
o Robert is American
American(Robert). ..........(8)

Forward chaining proof:

Step-1:

In the first step we will start with the known facts and will choose the sentences which do not have implications, such
as: American(Robert), Enemy(A, America), Owns(A, T1), and Missile(T1). All these facts will be represented as
below.

Downloaded from EnggTree.com


EnggTree.com

Step-2:

At the second step, we will see those facts which infer from available facts and with satisfied premises.

Rule-(1) does not satisfy premises, so it will not be added in the first iteration.

Rule-(2) and (3) are already added.

Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers from the conjunction of
Rule (2) and (3).

Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from Rule-(7).

Step-3:

At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we can add
Criminal(Robert) which infers all the available facts. And hence we reached our goal statement.

Hence it is proved that Robert is Criminal using forward chaining approach.

B. Backward Chaining:

Backward-chaining is also known as a backward deduction or backward reasoning method when using an inference
engine. A backward chaining algorithm is a form of reasoning, which starts with the goal and works backward,
chaining through rules to find known facts that support the goal.

Properties of backward chaining:

o It is known as a top-down approach.


o Backward-chaining is based on modus ponens inference rule.

Downloaded from EnggTree.com


EnggTree.com

o In backward chaining, the goal is broken into sub-goal or sub-goals to prove the facts true.
o It is called a goal-driven approach, as a list of goals decides which rules are selected and used.
o Backward -chaining algorithm is used in game theory, automated theorem proving tools, inference engines,
proof assistants, and various AI applications.
o The backward-chaining method mostly used a depth-first search strategy for proof.

Example:

In backward-chaining, we will use the same above example, and will rewrite all the rules.

o American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)


Owns(A, T1) ........(2)
o Missile(T1)
o ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
o Missile(p) → Weapons (p) .......(5)
o Enemy(p, America) →Hostile(p) ........(6)
o Enemy (A, America) .........(7)
o American(Robert). ..........(8)

Backward-Chaining proof:

In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and then infer further rules.

Step-1:

At the first step, we will take the goal fact. And from the goal fact, we will infer other facts, and at last, we will prove
those facts true. So our goal fact is "Robert is Criminal," so following is the predicate of it.

Step-2:

At the second step, we will infer other facts form goal fact which satisfies the rules. So as we can see in Rule-1, the
goal predicate Criminal (Robert) is present with substitution {Robert/P}. So we will add all the conjunctive facts below
the first level and will replace p with Robert.

Here we can see American (Robert) is a fact, so it is proved here.

Downloaded from EnggTree.com


EnggTree.com

Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it satisfies Rule-(5). Weapon
(q) is also true with the substitution of a constant T1 at q.

Step-4:

At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which satisfies the Rule- 4, with the
substitution of A in place of r. So these two statements are proved here.

Downloaded from EnggTree.com


EnggTree.com

Step-5:

At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule- 6. And hence all the
statements are proved true using backward chaining.

10. Difference between backward chaining and forward chaining

Difference between backward chaining and forward chaining

Following is the difference between the forward chaining and backward chaining:

o Forward chaining as the name suggests, start from the known facts and move forward by applying inference
rules to extract more data, and it continues until it reaches to the goal, whereas backward chaining starts
from the goal, move backward by using inference rules to determine the facts that satisfy the goal.
o Forward chaining is called a data-driven inference technique, whereas backward chaining is called a goal-
driven inference technique.
o Forward chaining is known as the down-up approach, whereas backward chaining is known as a top-
down approach.

Downloaded from EnggTree.com


EnggTree.com

o Forward chaining uses breadth-first search strategy, whereas backward chaining uses depth-first
search strategy.
o Forward and backward chaining both applies Modus ponens inference rule.
o Forward chaining can be used for tasks such as planning, design process monitoring, diagnosis, and
classification, whereas backward chaining can be used for classification and diagnosis tasks.
o Forward chaining can be like an exhaustive search, whereas backward chaining tries to avoid the unnecessary
path of reasoning.
o In forward-chaining there can be various ASK questions from the knowledge base, whereas in backward
chaining there can be fewer ASK questions.
o Forward chaining is slow as it checks for all the rules, whereas backward chaining is fast as it checks few
required rules only.

S. Forward Chaining Backward Chaining


No.

1. Forward chaining starts from known Backward chaining starts from the goal
facts and applies inference rule to and works backward through inference
extract more data unit it reaches to rules to find the required facts that
the goal. support the goal.

2. It is a bottom-up approach It is a top-down approach

3. Forward chaining is known as data- Backward chaining is known as goal-


driven inference technique as we driven technique as we start from the
reach to the goal using the available goal and divide into sub-goal to extract
data. the facts.

4. Forward chaining reasoning applies a Backward chaining reasoning applies a


breadth-first search strategy. depth-first search strategy.

5. Forward chaining tests for all the Backward chaining only tests for few
available rules required rules.

6. Forward chaining is suitable for the Backward chaining is suitable for


planning, monitoring, control, and diagnostic, prescription, and debugging
interpretation application. application.

7. Forward chaining can generate an Backward chaining generates a finite


infinite number of possible number of possible conclusions.
conclusions.

8. It operates in the forward direction. It operates in the backward direction.

9. Forward chaining is aimed for any Backward chaining is only aimed for the
conclusion. required data.

11. Resolution

Resolution in FOL

Resolution

Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e., proofs by contradictions.
It was invented by a Mathematician John Alan Robinson in the year 1965.

Downloaded from EnggTree.com


EnggTree.com

Resolution is used, if there are various statements are given, and we need to prove a conclusion of those statements.
Unification is a key concept in proofs by resolutions. Resolution is a single inference rule which can efficiently operate
on the conjunctive normal form or clausal form.

Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit clause.

Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to be conjunctive normal
form or CNF.

The resolution inference rule:

The resolution rule for first-order logic is simply a lifted version of the propositional rule. Resolution can resolve two
clauses if they contain complementary literals, which are assumed to be standardized apart so that they share no
variables.

Where li and mj are complementary literals.

This rule is also called the binary resolution rule because it only resolves exactly two literals.

Example:

We can resolve two clauses which are given below:

[Animal (g(x) V Loves (f(x), x)] and [¬ Loves(a, b) V ¬Kills(a, b)]

Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)

These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:

[Animal (g(x) V ¬ Kills(f(x), x)].

Steps for Resolution:


1. Conversion of facts into first-order logic.
2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).

To better understand all the above steps, we will take an example in which we will apply resolution.

Example:
a. John likes all kind of food.
b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive

Downloaded from EnggTree.com


EnggTree.com

e. Harry eats everything that Anil eats.


Prove by resolution that:
f. John likes peanuts.

Step-1: Conversion of Facts into FOL

In the first step we will convert all the given statements into its first order logic.

Step-2: Conversion of FOL into CNF

In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier for resolution
proofs.

o Eliminate all implication (→) and rewrite


a. ∀x ¬ food(x) V likes(John, x)
b. food(Apple) Λ food(vegetables)
c. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
d. eats (Anil, Peanuts) Λ alive(Anil)
e. ∀x ¬ eats(Anil, x) V eats(Harry, x)
f. ∀x¬ [¬ killed(x) ] V alive(x)
g. ∀x ¬ alive(x) V ¬ killed(x)
h. likes(John, Peanuts).
o Move negation (¬)inwards and rewrite
. ∀x ¬ food(x) V likes(John, x)
a. food(Apple) Λ food(vegetables)
b. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
c. eats (Anil, Peanuts) Λ alive(Anil)
d. ∀x ¬ eats(Anil, x) V eats(Harry, x)
e. ∀x ¬killed(x) ] V alive(x)
f. ∀x ¬ alive(x) V ¬ killed(x)
g. likes(John, Peanuts).
o Rename variables or standardize variables
. ∀x ¬ food(x) V likes(John, x)
a. food(Apple) Λ food(vegetables)
b. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)
c. eats (Anil, Peanuts) Λ alive(Anil)

Downloaded from EnggTree.com


EnggTree.com

d. ∀w¬ eats(Anil, w) V eats(Harry, w)


e. ∀g ¬killed(g) ] V alive(g)
f. ∀k ¬ alive(k) V ¬ killed(k)
g. likes(John, Peanuts).

o Eliminate existential instantiation quantifier by elimination.


In this step, we will eliminate existential quantifier ∃, and this process is known as Skolemization. But in this
example problem since there is no existential quantifier so all the statements will remain same in this step.
o Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are not implicitly quantified so we
don't need it.
a. ¬ food(x) V likes(John, x)
b. food(Apple)
c. food(vegetables)
d. ¬ eats(y, z) V killed(y) V food(z)
e. eats (Anil, Peanuts)
f. alive(Anil)
g. ¬ eats(Anil, w) V eats(Harry, w)
h. killed(g) V alive(g)
i. ¬ alive(k) V ¬ killed(k)
j. likes(John, Peanuts).

Note: Statements "food(Apple) Λ food(vegetables)" and "eats (Anil, Peanuts) Λ alive(Anil)" can be written in two separate statements.

o Distribute conjunction ∧ over disjunction ¬.


This step will not make any change in this problem.

Step-3: Negate the statement to be proved

In this statement, we will apply negation to the conclusion statements, which will be written as ¬likes(John, Peanuts)

Step-4: Draw Resolution graph:

Now in this step, we will solve the problem by resolution tree using substitution. For the above problem, it will be
given as follows:

Downloaded from EnggTree.com


EnggTree.com

Hence the negation of the conclusion has been proved as a complete contradiction with the given set of statements.

Explanation of Resolution graph:


o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get resolved(canceled) by
substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
o In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved (canceled) by
substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V killed(y) .
o In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get resolved by
substitution {Anil/y}, and we are left with Killed(Anil) .
o In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by substitution {Anil/k},
and we are left with ¬ alive(Anil) .
o In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.

Downloaded from EnggTree.com


EnggTree.com

AL3391 ARTIFICIAL INTELLIGENCE

UNIT V PROBABILISTIC REASONING


Acting under uncertainty – Bayesian inference – naïve Bayes models. Probabilistic reasoning –
Bayesian networks – exact inference in BN – approximate inference in BN – causal networks.

1. Acting under uncertainty

Uncertainty:

Till now, we have learned knowledge representation using first-order logic and propositional
logic with certainty, which means we were sure about the predicates. With this knowledge
representation, we might write A→B, which means if A is true then B is true, but consider a
situation where we are not sure about whether A is true or not then we cannot express this
statement, this situation is called uncertainty.

So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.

Causes of uncertainty:

Following are some leading causes of uncertainty to occur in the real world.

1. Information occurred from unreliable sources.


2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.

2. Bayesian inference

Bayes' theorem in Artificial intelligence

Bayes' theorem:

Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the probability of
an event with uncertain knowledge.

In probability theory, it relates the conditional probability and marginal probabilities of two random events.

Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is an application
of Bayes' theorem, which is fundamental to Bayesian statistics.

It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).

Downloaded from EnggTree.com


EnggTree.com

Bayes' theorem allows updating the probability prediction of an event by observing new information of the real world.

Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of
cancer more accurately with the help of age.

Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:

As from product rule we can write:

P(A ⋀ B)= P(A|B) P(B) or

Similarly, the probability of event B with known event A:

P(A ⋀ B)= P(B|A) P(A)


Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern AI systems
for probabilistic inference.

It shows the simple relationship between joint and conditional probabilities. Here,

P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when we
have occurred an evidence B.

P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability of
evidence.

P(A) is called the prior probability, probability of hypothesis before considering the evidence

P(B) is called marginal probability, pure probability of an evidence.

In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.

Applying Bayes' rule:

Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This is very useful in cases
where we have a good probability of these three terms and want to determine the fourth one. Suppose we want to
perceive the effect of some unknown cause, and want to compute that cause, then the Bayes' rule becomes:

Downloaded from EnggTree.com


EnggTree.com

Example-1:

Question: what is the probability that a patient has diseases meningitis with a stiff neck?

Given Data:

A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the time. He is
also aware of some more facts, which are given as follows:

o The Known probability that a patient has meningitis disease is 1/30,000.


o The Known probability that a patient has a stiff neck is 2%.

Let a be the proposition that patient has stiff neck and b be the proposition that patient has meningitis. , so we can
calculate the following as:

P(a|b) = 0.8

P(b) = 1/30000

P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.

Example-2:

Question: From a standard deck of playing cards, a single card is drawn. The probability that the card is king is
4/52, then calculate posterior probability P(King|Face), which means the drawn face card is a king card.

Solution:

P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13

P(Face|King): probability of face card when we assume it is a king = 1

Putting all values in equation (i) we will get:

Downloaded from EnggTree.com


EnggTree.com

Application of Bayes' theorem in Artificial intelligence:

Following are some applications of Bayes' theorem:

o It is used to calculate the next step of the robot when the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.

3. Probabilistic reasoning

Probabilistic reasoning:

Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to indicate
the uncertainty in knowledge. In probabilistic reasoning, we combine probability theory with logic to handle the
uncertainty.

We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is the result of
someone's laziness and ignorance.

In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as "It will rain
today," "behavior of someone for some situations," "A match between two teams or two players." These are probable
sentences for which we can assume that it will happen but not sure about it, so here we use probabilistic reasoning.

Need of probabilistic reasoning in AI:

o When there are unpredictable outcomes.


o When specifications or possibilities of predicates becomes too large to handle.
o When an unknown error occurs during an experiment.

In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:

o Bayes' rule
o Bayesian Statistics

As probabilistic reasoning uses probability and related terms, so before understanding probabilistic reasoning, let's
understand some common terms:

Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure of
the likelihood that an event will occur. The value of probability always remains between 0 and 1 that represent ideal
uncertainties.

Downloaded from EnggTree.com


EnggTree.com

0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

P(A) = 0, indicates total uncertainty in an event A.

P(A) =1, indicates total certainty in an event A.

We can find the probability of an uncertain event by using the below formula.

o P(¬A) = probability of a not happening event.


o P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.

Sample space: The collection of all possible events is called sample space.

Random variables: Random variables are used to represent the events and objects in the real world.

Prior probability: The prior probability of an event is probability computed before observing new information.

Posterior Probability: The probability that is calculated after all evidence or information has taken into account. It is
a combination of prior probability and new information.

Conditional probability:

Conditional probability is a probability of occurring an event when another event has already happened.

Let's suppose, we want to calculate the event A when event B has already occurred, "the probability of A under the
conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

If the probability of A is given and we need to find the probability of B, then it will be given as:

It can be explained by using the below Venn diagram, where B is occurred event, so sample space will be reduced to
set B, and now we can only calculate event A when event B is already occurred by dividing the probability of P(A⋀B)
by P( B ).

Downloaded from EnggTree.com


EnggTree.com

Example:

In a class, there are 70% of the students who like English and 40% of the students who likes English and mathematics,
and then what is the percent of students those who like English also like mathematics?

Solution:

Let, A is an event that a student likes Mathematics

B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.

4. Bayesian networks or Belief networks

Bayesian Belief Network in artificial intelligence

Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network as:

"A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."

It is also called a Bayes network, belief network, decision network, or Bayesian model.

Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also use
probability theory for prediction and anomaly detection.

Real world applications are probabilistic in nature, and to represent the relationship between multiple events, we need
a Bayesian network. It can also be used in various tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction, and decision making under uncertainty.

Downloaded from EnggTree.com


EnggTree.com

Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:

o Directed Acyclic Graph


o Table of conditional probabilities.

The generalized form of Bayesian network that represents and solve decision problems under uncertain knowledge is
known as an Influence diagram.

A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between random
variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link that
means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of the
network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then node
A is called the parent of Node B.
o Node C is independent of node A.

Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed acyclic
graph or DAG

The Bayesian network has mainly two components:

o Causal Component
o Actual numbers

Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which determines the
effect of the parent on that node.

Bayesian network is based on Joint probability distribution and conditional probability. So let's first understand the
joint probability distribution:

Downloaded from EnggTree.com


EnggTree.com

Joint probability distribution:

If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3.. xn, are known as
Joint probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

In general for each variable Xi, we can write the equation as:

P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a directed acyclic graph:

Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at detecting
a burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always calls Harry when he hears the alarm,
but sometimes he got confused with the phone ringing and calls at that time too. On the other hand, Sophia likes to
listen to high music, so sometimes she misses to hear the alarm. Here we would like to compute the probability of
Burglary Alarm.

Problem:

Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake occurred, and David and Sophia both called the
Harry.

Solution:

o The Bayesian network for the above problem is given below. The network structure is showing that burglary
and earthquake is the parent node of the alarm and directly affecting the probability of alarm's going off,
but David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the burglary and also do not
notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set of
cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two parents,
then CPT will contain 4 probability values

List of all events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)

Downloaded from EnggTree.com


EnggTree.com

o Sophia calls(S)

We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the above
probability statement using joint probability distribution:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Let's take the observed probability for the Burglary and earthquake component:

P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

P(E= False)= 0.999, Which is the probability that an earthquake not occurred.

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and earthquake:

Downloaded from EnggTree.com


EnggTree.com

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the probability of Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, we can write the problem statement in the form of probability distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).

= 0.75* 0.91* 0.001* 0.998*0.999

= 0.00068045.

Hence, a Bayesian network can answer any query about the domain by using Joint distribution.

The semantics of Bayesian Network:

There are two ways to understand the semantics of the Bayesian network, which is given below:

1. To understand the network as the representation of the Joint probability distribution.

It is helpful to understand how to construct the network.

2. To understand the network as an encoding of a collection of conditional independence statements.

It is helpful in designing inference procedure.

Downloaded from EnggTree.com


EnggTree.com

5. Inference in Bayesian Networks


1. Exact inference
2. Approximate inference

1. Exact inference:

In exact inference, we analytically compute the conditional probability distribution over the
variables of interest.

But sometimes, that’s too hard to do, in which case we can use approximation techniques based on
statistical sampling

Given a Bayesian network, what questions might we want to ask?

• Conditional probability query: P(x | e)

• Maximum a posteriori probability: What value of x maximizes P(x|e) ?

General question: What’s the whole probability distribution over variable X given evidence e, P(X |
e)?

In our discrete probability situation, the only way to answer a MAP query is to compute the
probability of x given e for all possible values of x and see which one is greatest

So, in general, we’d like to be able to compute a whole probability distribution over some variable or
variables X, given instantiations of a set of variables e

Using the joint distribution

To answer any query involving a conjunction of variables, sum over the variables not involved in the
query

Given the joint distribution over the variables, we can easily answer any question about the value of
a single variable by summing (or marginalizing) over the other variables.

So, in a domain with four variables, A, B, C, and D, the probability that variable D has value d is the
sum over all possible combinations of values of the other three variables of the joint probability of
all four values. This is exactly the same as the procedure we went through in the last lecture, where
to compute the probability of cavity, we added up the probability of cavity and toothache and the
probability of cavity and not toothache.

Downloaded from EnggTree.com


EnggTree.com

In general, we’ll use the first notation, with a single summation indexed by a list of variable names,
and a joint probability expression that mentions values of those variables. But here we can see the
completely written-out definition, just so we all know what the shorthand is supposed to mean.

To compute a conditional probability, we reduce it to a ratio of conjunctive queries using the


definition of conditional probability, and then answer each of those queries by marginalizing out the
variables not mentioned.

In the numerator, here, you can see that we’re only summing over variables A and C, because b and
d are instantiated in the query.

We’re going to learn a general purpose algorithm for answering these joint queries fairly efficiently.
We’ll start by looking at a very simple case to build up our intuitions, then we’ll write down the
algorithm, then we’ll apply it to a more complex case.

Here’s our very simple case. It’s a bayes net with four nodes, arranged in a chain.

Downloaded from EnggTree.com


EnggTree.com

So, we know from before that the probability that variable D has some value little d is the sum over
A, B, and C of the joint distribution, with d fixed.

Now, using the chain rule of Bayesian networks, we can write down the joint probability as a
product over the nodes of the probability of each node’s value given the values of its parents. So, in
this case, we get P(d|c) times P(c|b) times P(b|a) times P(a).

This expression gives us a method for answering the query, given the conditional probabilities that
are stored in the net. And this method can be applied directly to any other bayes net. But there’s a
problem with it: it requires enumerating all possible combinations of assignments to A, B, and C, and
then, for each one, multiplying the factors for each node. That’s an enormous amount of work and
we’d like to avoid it if at all possible.

So, we’ll try rewriting the expression into something that might be more efficient to evaluate. First,
we can make our summation into three separate summations, one over each variable.

Then, by distributivity of addition over multiplication, we can push the summations in, so that the
sum over A includes all the terms that mention A, but no others, and so on. It’s pretty clear that this
expression is the same as the previous one in value, but it can be evaluated more efficiently. We’re
still, eventually, enumerating all assignments to the three variables, but we’re doing somewhat
fewer multiplications than before. So this is still not completely satisfactory.

If you look, for a minute, at the terms inside the summation over A, you’ll see that we’re doing these
multiplications over for each value of C, which isn’t necessary, because they’re independent of C.
Our idea, here, is to do the multiplications once and store them for later use. So, first, for each value
of A and B, we can compute the product, generating a two dimensional matrix.

Downloaded from EnggTree.com


EnggTree.com

Then, we can sum over the rows of the matrix, yielding one value of the sum for each possible value
of b.

We’ll call this set of values, which depends on b, f1 of b.

Now, we can substitute f1 of b in for the sum over A in our previous expression. And, effectively, we
can remove node A from our diagram. Now, we express the contribution of b, which takes the
contribution of a into account, as f_1 of b.

We can continue the process in basically the same way. We can look at the summation over b and
see that the only other variable it involves is c. We can summarize those products as a set of factors,
one for each value of c. We’ll call those factors f_2 of c.

We substitute f_2 of c into the formula, remove node b from the diagram, and now we’re down to a
simple expression in which d is known and we have to sum over values of c.

Downloaded from EnggTree.com


EnggTree.com

Variable Elimination Algorithm

Given a Bayesian network, and an elimination order for the non-query variables , compute

For i = m downto 1

 remove all the factors that mention Xi


 multiply those factors, getting a value for each combination of mentioned variables
 sum over Xi
 put this new factor into the factor set

That was a simple special case. Now we can look at the algorithm in the general case. Let’s assume
that we’re given a Bayesian network and an ordering on the variables that aren’t fixed in the query.
We’ll come back later to the question of the influence of the order, and how we might find a good
one.

We can express the probability of the query variables as a sum over each value of each of the non-
query variables of a product over each node in the network, of the probability that that variable has
the given value given the values of its parents.

So, we’ll eliminate the variables from the inside out. Starting with variable Xm and finishing with
variable X1.

To eliminate variable Xi, we start by gathering up all of the factors that mention Xi, and removing
them from our set of factors. Let’s say there are k such factors.

Now, we make a k+1 dimensional table, indexed by Xi as well as each of the other variables that is
mentioned in our set of factors.

We then sum the table over the Xi dimension, resulting in a k-dimensional table.

This table is our new factor, and we put a term for it back into our set of factors.

Once we’ve eliminated all the summations, we have the desired value.

One more example

Downloaded from EnggTree.com


EnggTree.com

Downloaded from EnggTree.com


EnggTree.com

Here’s a more complicated example, to illustrate the variable elimination algorithm in a more
general case. We have this big network that encodes a domain for diagnosing lung disease.
(Dyspnea, as I understand it, is shortness of breath).

We’ll do variable elimination on this graph using elimination order A, B, L, T, S, X, V.

So, we start by eliminating V. We gather the two terms that mention V and see that they also
involve variable T. So, we compute the product for each value of T, and summarize those in the
factor f1 of T.

Now we can substitute that factor in for the summation, and remove the node from the network.

The next variable to be eliminated is X. There is actually only one term involving X, and it also
involves variable A. So, for each value of A, we compute the sum over X of P(x|a). But wait! We
know what this value is! If we fix a and sum over x, these probabilities have to add up to 1.

So, rather than adding another factor to our expression, we can just remove the whole sum. In
general, the only nodes that will have an influence on the probability of D are its ancestors.

Now, it’s time to eliminate S. We find that there are three terms involving S, and we gather them
into the sum. These three terms involve two other variables, B and L. So we have to make a factor
that specifies, for each value of B and L, the value of the sum of products.

We’ll call that factor f_2 of b and l.

Now we can substitute that factor back into our expression. We can also eliminate node S. But in
eliminating S, we’ve added a direct dependency between L and B (they used to be dependent via S,
but now the dependency is encoded explicitly in f2(b). We’ll show that in the graph by drawing a line
between the two nodes. It’s not exactly a standard directed conditional dependence, but it’s still
useful to show that they’re coupled.

Now we eliminate T. It involves two terms, which themselves involve variables A and L. So we make
a new factor f3 of A and L.

We can substitute in that factor and eliminate T. We’re getting close!

Downloaded from EnggTree.com


EnggTree.com

Next we eliminate L. It involves these two factors, which depend on variables A and B. So we make a
new factor, f4 of A and B, and substitute it in. We remove node L, but couple A and B.

At this point, we could just do the summations over A and B and be done. But to finish out the
algorithm the way a computer would, it’s time to eliminate variable B.

It involves both of our remaining terms, and it seems to depend on variables A and D. However, in
this case, we’re interested in the probability of a particular value, little d of D, and so the variable d
is instantiated. Thus, we can treat it as a constant in this expression, and we only need to generate a
factor over a, which we’ll call f5 of a. And we can now, in some sense, remove D from our network
as well (because we’ve already factored it into our answer).

Finally, to get the probability that variable D has value little d, we simply sum factor f5 over all
values of a. Yay! We did it.

Properties of Variable Elimination

Let’s see how the variable elimination algorithm performs, both in theory and in practice.

 Time is exponential in size of largest factor


 Bad elimination order can generate huge factors
 NP Hard to find the best elimination order

Downloaded from EnggTree.com


EnggTree.com

 Even the best elimination order may generate large factors


 There are reasonable heuristics for picking an elimination order (such as choosing the
variable that results in the smallest next factor)
 Inference in polytrees (nets with no cycles) is linear in size of the network (the largest CPT)
 Many problems with very large nets have only small factors, and thus efficient inference

First of all, it’s pretty easy to see that it runs in time exponential in the number of variables involved in
the largest factor. Creating a factor with k variables involves making a k+1 dimensional table. If you have
b values per variable, that’s a table of size b^(k+1). To make each entry, you have to multiply at most n
numbers, where n is the number of nodes. We have to do this for each variable to be eliminated (which
is usually close to n). So we have something like time = O(n^2 b^k).

How big the factors are depends on the elimination order. You’ll see in one of the recitation exercises
just how dramatic the difference in factor sizes can be. A bad elimination order can generate huge
factors.

So, we’d like to use the elimination order that generates the smallest factors. Unfortunately, it turns out
to be NP hard to find the best elimination order.

At least, there are some fairly reasonable heuristics for choosing an elimination order. It’s usually done
dynamically. So, rather than fixing the elimination order in advance, as we suggested in the algorithm
description, you can pick the next variable to be eliminated depending on the situation. In particular,
one reasonable heuristic is to pick the variable to eliminate next that will result in the smallest factor.
This greedy approach won’t always be optimal, but it’s not usually too bad.

There is one case where Bayes net inference in general, and the variable elimination algorithm in
particular is fairly efficient, and that’s when the network is a polytree. A polytree is a network with no
cycles. That is, a network in which, for any two nodes, there is only one path between them. In a
polytree, inference is linear in the size of the network, where the size of the network is defined to be the
size of the largest conditional probability table (or exponential in the maximum number of parents of
any node). In a polytree, the optimal elimination order is to start at the root nodes, and work
downwards, always eliminating a variable that no longer has any parents. In doing so, we never
introduce additional connections into the network.

So, inference in polytrees is efficient, and even in many large non-polytree networks, it’s possible to
keep the factors small, and therefore to do inference relatively efficiently.

When the network is such that the factors are, of necessity, large, we’ll have to turn to a different class
of methods.

2. Approximate inference:

Sampling

To get approximate answer we can do stochastic simulation (sampling).

Downloaded from EnggTree.com


EnggTree.com

Another strategy, which is a theme that comes up also more and more in AI actually, is to say, well, we
didn't really want the right answer anyway. Let's try to do an approximation. And you can also show that
it's computationally hard to get an approximation that's within epsilon of the answer that you want, but
again that doesn't keep us from trying.

So, the other thing that we can do is the stochastic simulation or sampling. In sampling, what we do is
we look at the root nodes of our graph, and attached to this root node is some probability that A is going
to be true, right? Maybe it's .4. So we flip a coin that comes up heads with probability .4 and see if we
get true or false.

We flip our coin, let's say, and we get true for A -- this time. And now, given the assignment of true to A,
we look in the conditional probability table for B given A = true, and that gives us a probability for B.

Now, we flip a coin with that probability. Say we get False. We enter that into the table.

We do the same thing for C, and let’s say we get True.

Now, we look in the CPT for D given B and C, for the case where B is false and C is true, and we flip a coin
with that probability, in order to get a value for D.

So, there's one sample from the joint distribution of these four variables. And you can just keep doing
this, all day and all night, and generate a big pile of samples, using that algorithm. And now you can ask
various questions.

Estimate:

P*(D|A) = #D,A / #A

Let's say you want to know the probability of D given A. How would you answer - - given all the
examples -- what would you do to compute the probability of D given A? You would just count. You’d
count the number of cases in which A and D were true, and you’d divide that by the number of cases in

Downloaded from EnggTree.com


EnggTree.com

which A was true, and that would give you an unbiased estimate of the probability of D given A. The
more samples, the more confidence you’d have that the estimated probability is close to the true one.

Estimation

 Some probabilities are easier than others to estimate


 In generating the table, the rare events will not be well represented
 P(Disease| spots-on-your-tongue, sore toe)
 If spots-on-your-tongue and sore toe are not root nodes, you would generate a huge table but
the cases of interest would be very sparse in the table
 Importance sampling lets you focus on the set of cases that are important to answering your
question
It's going to turn out that some probabilities are easier than other ones to estimate.

Exactly because of the process we’re using to generate the samples, the majority of them will be the
typical cases. Oh, it's someone with a cold, someone with a cold, someone with a cold, someone with a
cold, someone with a cold, someone with malaria, someone with a cold, someone with a cold. So the
rare results are not going to come up very often. And so doing this sampling naively can make it really
hard to estimate the probability of a rare event. If it's something that happens one in ten thousand
times, well, you know for sure you're going to need, some number of tens of thousands of samples to
get even a reasonable estimate of that probability.

Imagine that you want to estimate the probability of some disease given -- oh, I don't know -- spots on
your tongue and a sore toe. Somebody walks in and they have a really peculiar set of symptoms, and
you want to know what's the probability that they have some disease.

Well, if the symptoms are root nodes, it's easy. If the symptoms were root nodes, you could just assign
the root nodes to have their observed values and then simulate the rest of the network as before.

But if the symptoms aren't root nodes then if you do naïve sampling, you would generate a giant table
of samples, and you'd have to go and look and say, gosh, how many cases do I have where somebody
has spots on their tongue and a sore toe; and the answer would be, well, maybe zero or not very many.

There’s a technique called importance sampling, which allows you to draw examples from a distribution
that’s going to be more helpful and then reweight them so that you can still get an unbiased estimate of
the desired conditional probability. It’s a bit beyond the scope of this class to get into the details, but it’s
an important and effective idea.

Recitation Problem

• Do the variable elimination algorithm on the net below using the elimination order A,B,C (that is,
eliminate node C first). In computing P(D=d), what factors do you get?

• What if you wanted to compute the whole marginal distribution P(D)?

Downloaded from EnggTree.com


EnggTree.com

Here’s the network we started with. We used elimination order C, B, A (we eliminated A first). Now
we’re going to explore what happens when we eliminate the variables in the opposite order. First, work
on the case we did, where we’re trying to calculate the probability that node D takes on a particular
value, little d. Remember that little d is a constant in this case. Now, do the case where we’re trying to
find the whole distribution over D, so we don’t know a particular value for little d.

Another Recitation Problem

Find an elimination order that keeps the factors small for the net below, or show that there is no such
order.

Here’s a pretty complicated graph. But notice that no node has more than 2 parents, so none of the
CPTs are huge. The question is, is this graph hard for variable elimination? More concretely, can you find
an elimination order that results only in fairly small factors? Is there an elimination order that generates
a huge factor?

The Last Recitation Problem

Bayesian networks (or related models) are often used in computer vision, but they almost always
require sampling. What happens when you try to do variable elimination on a model like the grid below?

Downloaded from EnggTree.com


EnggTree.com

6. Casual Networks:

A causal network is an acyclic digraph arising from an evolution of a substitution system, and
representing its history. The illustration above shows a causal network corresponding to the rules
(applied in a left-to-right scan) and initial condition .

The figure above shows the procedure for diagrammatically creating a causal network from
a mobile automaton.

In an evolution of a multiway system, each substitution event is a vertex in a causal network.


Two events which are related by causal dependence, meaning one occurs just before the other,
have an edge between the corresponding vertices in the causal network. More precisely, the edge
is a directed edge leading from the past event to the future event.
Some causal networks are independent of the choice of evolution, and these are called causally
invariant.

Downloaded from EnggTree.com


www.BrainKart.com

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Exam Date: 23.09.2024 Reg. No.:

VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN


Elayampalayam, Tiruchengode, Namakkal – 637 205.
Internal Assessment Test - I
Third Semester
Artificial Intelligence and Data Science
AL3391 - ARTIFICIAL INTELLIGENCE
(Regulation 2021)
Time: 3.00Hrs Maximum: 100 Marks
Answer ALL Questions
Part – A (10 x 2 = 20 marks)
1. Define AI. List out its applications. 2 CO1 L1
2. Characterize the environment of an agent playing soccer. 2 CO1 L3
3. List out the steps involved in problem solving. 2 CO1 L1
4. Define Iterative deepening search. 2 CO1 L1
5. What are the things that agent knows in online search 2 CO2 L2
problems?
6. What is local search? 2 CO2 L1
7. Define heuristic function. 2 CO2 L1
8. Define annealing. 2 CO2 L1
9. What is Game theory? 2 CO3 L1
10. Write about zero sum game. 2 CO3 L1
Part – B (5 x 16 = 80 marks)
11. a What are informed search techniques? Explain any with 16 CO1 L2
examples.
(OR)
11.b List the basic kinds of Intelligent Agents and explain with 16 CO1 L2
neat schematic diagram.

12. a Explain the Breadth First search, Uniform cost Search and 16 CO1 L2
Depth First search algorithms with examples.
(OR)
12.b Explain the Depth limited search, Iterative deepening and 16 CO1 L2
Bidirectional search algorithms with examples.

13. a What is heuristic search technique in AI? How does 16 CO2 L1


heuristics search works? Explain its advantages and
disadvantages.
(OR)
13.b Describe local search algorithms with neat sketch. 16 CO2 L1
14. a Explain about the searching with non-deterministic 16 CO2 L2
actions, with an example to formulate a problem solution.
(OR)
14.b Write about AO* algorithm with example. 16 CO2 L2

15. a Explain about Alpha-Beta pruning algorithm with an 16 CO3 L3


example.
(OR)
15.b (i)Explain about Min-Max algorithm with example. 10 CO3 L3
(ii) Write short notes on Monte-Carlo search. 6 L3

************

Blooms Taxonomy Level Marks in each Divisions Total % of


(BTL) PART A PART B Marks Distribution
Remember (L1) 16 32 48 26
Understanding (L2) 2 96 98 54
Apply (L3) - 32 32 17
Analyze (L4) 2 - 2 1
Total 20 160 180 100%
Exam Date: 23.09.2024 Reg. No.:

VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN


Elayampalayam, Tiruchengode, Namakkal – 637 205.
Internal Assessment Test - I
Third Semester
Artificial Intelligence and Data Science
AL3391 - ARTIFICIAL INTELLIGENCE
ANSWER KEY
(Regulation 2021)
Time: 3.00Hrs Maximum: 100 Marks
Answer ALL Questions
Part – A (10 x 2 = 20 marks)

1 Define AI. List out its applications.

Artificial Intelligence (AI) is the simulation of human intelligence in machines


designed to think and act like humans.

 Applications: Robotics, Natural Language Processing (NLP), Expert Systems,


Autonomous Vehicles, Healthcare, Finance.

2 Characterize the environment of an agent playing soccer.

The environment is dynamic, continuous, partially observable, multi-agent, and


stochastic.

3 List out the steps involved in problem-solving.

Initial state, Goal state, Possible actions, State transition model, Solution path.

4 Define Iterative deepening search.

Iterative deepening search is a combination of depth-first and breadth-first search


that repeatedly applies depth-limited search, increasing the limit each time.

5 What are the things that the agent knows in online search problems?

The agent knows the action model, current percepts, and its goal but lacks a
complete model of the environment.

6 What is local search?

Local search is an optimization technique that iteratively improves the current


solution based on local changes rather than exploring the entire search space.

7 Define heuristic function.

A heuristic function estimates the cost from the current state to the goal state,
helping to guide search algorithms.

8 Define annealing.

Simulated annealing is an optimization technique that probabilistically accepts


worse solutions in the hope of escaping local minima, simulating the annealing
process in metallurgy.
9 What is Game Theory?

Game theory is the study of mathematical models of strategic interaction among


rational decision-makers.

10 Write about zero-sum game.

A zero-sum game is a situation in which one participant's gain or loss is exactly


balanced by the losses or gains of the other participants.

Part – B (5 x 16 = 80 marks)

11.a. What are informed search techniques? Explain any with examples.

Informed search techniques, also known as heuristic search techniques, use additional
knowledge beyond the problem description to guide the search towards the goal more
efficiently than uninformed search techniques. These algorithms make use of heuristic
functions, which provide estimates of how close a state is to the goal. The heuristic function
is used to prioritize which nodes or states are explored next.

 Examples of informed search techniques:

1. Best-First Search: This algorithm selects the next node to explore based on a
heuristic function. A simple example is the Greedy Best-First Search, which
always selects the node that appears to be the closest to the goal based on a
heuristic function h(n)h(n)h(n).

 Example: Consider a robot navigating a grid, where each node


represents a point in the grid. The heuristic function could be the
Euclidean distance from the current node to the goal node.

2. A* Search: A more sophisticated informed search that combines the cost to


reach the current state g(n)g(n)g(n) and a heuristic function h(n)h(n)h(n) that
estimates the cost to reach the goal. It searches for the path that minimizes
f(n)=g(n)+h(n)f(n) = g(n) + h(n)f(n)=g(n)+h(n), ensuring an optimal solution if the
heuristic is admissible.

 Example: In a city road network, the heuristic h(n)h(n)h(n) could be the


straight-line distance from the current location to the destination, while
g(n)g(n)g(n) is the actual distance traveled so far.

Advantages: Heuristic searches, especially A*, are often more efficient and faster in finding
solutions compared to uninformed searches like Breadth-First or Depth-First Search.

Disadvantages: The efficiency of informed searches depends on the quality of the heuristic.
Poor heuristics can lead to suboptimal or slow searches.

11b. List the basic kinds of Intelligent Agents and explain with a neat schematic diagram.

Intelligent agents are systems that perceive their environment through sensors and act
upon it using actuators. They operate autonomously to achieve specific goals based on their
internal knowledge or beliefs. Below are the basic types of intelligent agents:

1. Simple Reflex Agents: These agents select actions based only on the current
percept, ignoring the rest of the percept history. They follow the condition-action
rules.

o Example: A thermostat that turns on heating when the temperature drops


below a threshold.

2. Model-Based Reflex Agents: These agents maintain an internal state to keep track
of aspects of the world that cannot be observed directly. The agent uses a model of
the world to make decisions.
o Example: A self-driving car that keeps track of nearby vehicles and road
conditions.

3. Goal-Based Agents: In addition to the state of the environment, these agents have
goals that they try to achieve. They decide their actions based on the distance from
the goal.

o Example: A chess-playing agent that chooses moves based on the goal of


winning the game.

4. Utility-Based Agents: These agents have a performance measure or utility function


that they try to maximize. They not only pursue goals but also measure the
desirability of different goal states.

o Example: An e-commerce recommendation system that suggests items to


maximize customer satisfaction.

5. Learning Agents: These agents can improve their performance based on past
experiences. They have a learning component that allows them to update their
knowledge or behavior.

o Example: A robot that learns how to navigate a maze based on trial and error.

Schematic Diagram: [Insert diagram showing the flow between the environment, sensors,
actuators, and decision-making components like the model, goals, or utility.]

12a. Explain the Breadth-First Search (BFS), Uniform Cost Search (UCS), and Depth-First
Search (DFS) algorithms with examples.

1. Breadth-First Search (BFS):

o Description: BFS explores all nodes at the current depth before moving to the
next level. It is guaranteed to find the shortest path in an unweighted graph.

o Example: In a maze, BFS would explore all possible adjacent paths step by
step, ensuring that it finds the shortest path to the exit.

o Time Complexity: O(bd)O(b^d)O(bd), where bbb is the branching factor and


ddd is the depth of the shallowest solution.

o Space Complexity: O(bd)O(b^d)O(bd).

2. Uniform Cost Search (UCS):

o Description: UCS is similar to BFS but takes into account varying costs of
edges. It expands the node with the lowest path cost first, ensuring the
optimal solution in weighted graphs.

o Example: Finding the shortest path between cities on a map where each edge
has a different travel cost (e.g., distances or time).

o Time & Space Complexity: O(bC∗/ϵ)O(b^{C*/\epsilon})O(bC∗/ϵ), where


C∗C*C∗ is the cost of the optimal solution and ϵ\epsilonϵ is the minimum cost
of any action.

3. Depth-First Search (DFS):

o Description: DFS explores as far as possible along each branch before


backtracking. It uses less memory than BFS but may get stuck in loops if the
search space is infinite.

o Example: In solving puzzles like Sudoku, DFS tries a path until it fails and
then backtracks to try another.
o Time Complexity: O(bm)O(b^m)O(bm), where mmm is the maximum depth of
the search tree.

o Space Complexity: O(bm)O(bm)O(bm), much less than BFS due to less


memory storage.

12b. Explain the Depth-Limited Search, Iterative Deepening Search, and Bidirectional
Search algorithms with examples.

1. Depth-Limited Search:

o Description: Depth-limited search is a variant of DFS where the search is


limited to a pre-specified depth. It prevents infinite loops in the case of infinite
search spaces.

o Example: In chess, a depth-limited search might explore only up to a certain


number of moves to avoid excessive computation.

2. Iterative Deepening Search:

o Description: Iterative Deepening combines the benefits of DFS and BFS. It


repeatedly applies depth-limited search, increasing the depth limit with each
iteration.

o Example: In a robot search problem, iterative deepening explores the


environment level by level but avoids memory overhead by using depth-first
exploration at each level.

3. Bidirectional Search:

o Description: Bidirectional search runs two simultaneous searches – one


forward from the start node and one backward from the goal node. The search
halts when both searches meet.

o Example: In pathfinding algorithms for GPS, bidirectional search can


significantly reduce the number of nodes explored.

13a. What is heuristic search technique in AI? How does heuristic search work? Explain its
advantages and disadvantages.

A heuristic search is an informed search strategy that uses heuristic functions to guide
the search process, speeding up the search for solutions in large problem spaces by
estimating which path is most promising. The key idea behind heuristic search is to
prioritize states that are likely to lead to a solution faster.

How Heuristic Search Works:

 A heuristic function h(n)h(n)h(n) provides an estimate of the cost to reach the goal
from a given node nnn. This is used to evaluate which node should be expanded
next.

 Best-First Search and A* are common heuristic search algorithms. In A*, the total
cost function f(n)=g(n)+h(n)f(n) = g(n) + h(n)f(n)=g(n)+h(n) is used, where:

o g(n)g(n)g(n) is the cost to reach node nnn from the start node.

o h(n)h(n)h(n) is the estimated cost to reach the goal from node nnn.

Example of A* Search:

Consider a robot trying to find the shortest path through a maze. Each cell in the maze has
a cost, and the goal is to minimize the path cost. The heuristic function could be the
Manhattan distance from the current cell to the goal cell. A* will expand the node that
minimizes the total estimated cost (sum of the cost to the node and the estimated
remaining cost).

Advantages of Heuristic Search:

1. Efficiency: It can significantly reduce the number of nodes expanded compared to


uninformed search methods, especially in large search spaces.

2. Optimality: If the heuristic is admissible (never overestimates the true cost), A*


guarantees finding an optimal solution.

3. Faster than BFS and DFS: Since it is guided by the heuristic, the search process
can avoid exploring paths that are unlikely to lead to a solution.

Disadvantages of Heuristic Search:

1. Heuristic Design: The performance of the search depends on the quality of the
heuristic. Designing a good heuristic can be difficult and problem-specific.

2. Memory Intensive: Algorithms like A* need to keep track of all explored nodes,
which can lead to high memory consumption.

3. Inaccurate Heuristics: If the heuristic function is not well designed or is inaccurate,


it can lead to suboptimal or incorrect results.

13b. Describe local search algorithms with neat sketches.

Local search algorithms are a class of search methods that operate by iteratively improving
a single solution. Unlike global search algorithms that explore a wide range of possibilities,
local search focuses on improving the current state by making small changes, often known
as hill climbing.

Types of Local Search Algorithms:

1. Hill Climbing:

o This algorithm begins with an arbitrary solution and attempts to improve it by


making incremental changes (i.e., climbing uphill towards a better solution). If
no better neighboring states are found, the search terminates.

o Example: In optimization problems, such as scheduling, hill climbing can try


swapping two tasks and checking if the new schedule is better.

o Drawback: Hill climbing can get stuck in local optima, where no neighboring
state appears better, but the solution is not globally optimal.

2. Simulated Annealing:

o Inspired by the process of annealing in metallurgy, this algorithm


probabilistically allows worse solutions in early stages (like jumping over
hills), helping escape local optima. As the search progresses, it becomes more
selective.

o Example: In traveling salesman problems, simulated annealing might initially


accept worse routes to avoid getting stuck in poor configurations.

o Advantage: Helps avoid local optima by allowing occasional downhill moves.

3. Genetic Algorithms:

o Genetic algorithms work by evolving a population of solutions over time using


operations such as selection, crossover, and mutation. Each generation of
solutions is assessed based on a fitness function.
o Example: In function optimization, genetic algorithms evolve a set of solutions
by combining the best-performing solutions.

o Advantage: Can explore a broad search space and avoid local optima by
maintaining diversity in the population.

Sketch: [Insert sketches showing hill climbing, simulated annealing, and genetic algorithm
processes with search states.]

14a. Explain about searching with non-deterministic actions with an example to formulate
a problem solution.

In some real-world situations, the outcomes of an agent’s actions are not entirely
predictable, leading to non-deterministic search. This occurs when the result of applying
an action can vary, and the agent must account for these possibilities when planning.

Example of Non-Deterministic Search:

Consider a robot navigating a warehouse where the ground may be slippery in some areas.
When the robot issues a command to move forward, the robot may slip and end up in a
different location than expected. The environment is non-deterministic because the same
action (move forward) can lead to different outcomes.

Problem Formulation in Non-Deterministic Search:

1. Initial State: The robot starts at a known position in the warehouse.

2. Actions: The robot can attempt to move forward, backward, left, or right, but the
outcome of these actions is uncertain.

3. Transition Model: Instead of a single deterministic result, the transition model


specifies a set of possible outcomes for each action.

4. Goal State: The robot must reach a specific location in the warehouse.

5. Solution: A contingency plan that accounts for all possible outcomes of actions.
For example, if the robot slips, it may try the same action again or choose a different
action depending on where it ends up.

Key Aspects of Non-Deterministic Search:

 Partial Knowledge: The agent does not know the exact result of its actions but has
some probability distribution of the outcomes.

 Strategies: The agent may need to adopt backup plans or conditional strategies
depending on the possible outcomes.

14b. Write about AO algorithm with an example.*

The AO* algorithm is used to solve problems represented as AND-OR graphs, where
some tasks (AND nodes) require multiple sub-tasks to be completed, while others (OR
nodes) can be solved by choosing one of several sub-tasks.

How AO* Works:

 AO* is a combination of A* search and dynamic programming. It uses a heuristic


evaluation function to estimate the cost of reaching a solution and continuously
refines this estimate as it explores the graph.

 In an AND-OR graph, nodes can be solved either by solving all of their child nodes
(AND) or by solving any one of their child nodes (OR).
 AO* updates the cost of each node based on the costs of its children, allowing it to
backtrack and revise decisions as better paths are discovered.

Example of AO*:

Consider a project where multiple tasks need to be completed. Some tasks depend on all of
their subtasks being completed (AND), while others can be solved by completing any one of
their subtasks (OR).

 Initial Node: Start the project.

 Actions: Choose subtasks to work on.

 Solution: The AO* algorithm will guide the project manager to focus on the most
critical path that minimizes the overall project time or cost.

15a. Explain about Alpha-Beta pruning algorithm with an example.

Alpha-Beta pruning is an optimization technique for the minimax algorithm used in two-
player games like chess. It reduces the number of nodes evaluated by the minimax
algorithm, cutting off branches that cannot influence the final decision.

How Alpha-Beta Pruning Works:

 Alpha-Beta pruning works by keeping track of two values during the minimax
search:

o Alpha: The best score that the maximizing player can guarantee.

o Beta: The best score that the minimizing player can guarantee.

 The algorithm prunes (cuts off) branches of the game tree that cannot possibly affect
the outcome of the game, thus saving computation time.

Example:

In a chess game, if a branch shows that the maximizing player can guarantee a win with a
certain move, Alpha-Beta pruning will stop evaluating any other moves that are guaranteed
to be worse.

 Initial Node: The current state of the game.

 Actions: Each possible move by both players.

 Pruning: If a particular move (node) results in a score that is worse than a


previously evaluated move, that entire branch is pruned from the search.

Advantages of Alpha-Beta Pruning:

1. Efficiency: Significantly reduces the number of nodes evaluated.

2. Optimality: Does not affect the outcome of the minimax algorithm—still finds the
optimal move.

15b. (i) Explain about Min-Max algorithm with example.

 Minimax Algorithm: The minimax algorithm is a decision rule used in two-player


games. It assumes that both players play optimally. The maximizing player aims to
maximize the score, while the minimizing player tries to minimize it.

o Example: In a game like Tic-Tac-Toe, the minimax algorithm evaluates all


possible moves, assuming that the opponent will also play optimally.

 Steps:
1. The game tree is generated, with each node representing a possible game state.

2. Minimax evaluates the game tree from the terminal states upwards, assigning
values based on the game's outcome.

3. The maximizing player selects the move that leads to the highest score, while
the minimizing player selects

15b(ii). Write short notes on Monte Carlo search.

Monte Carlo Search (MCS) is a heuristic search algorithm that uses randomness and
statistical sampling to estimate the value of different actions or decisions in games or other
problem-solving environments. It is particularly effective when the search space is large or
complex, and an exhaustive search is computationally impractical.

Key Concepts of Monte Carlo Search:

1. Random Sampling: The algorithm randomly samples different possible moves or


actions to explore the search space. This makes it feasible to deal with complex
problems where traditional search algorithms might be overwhelmed.

2. Simulation: In games, Monte Carlo Search often involves simulating many random
games from the current state to the end. Based on the outcomes of these
simulations, the algorithm estimates the value of different actions.

3. Evaluation: Each action's effectiveness is evaluated based on how often it leads to a


favorable outcome in these simulations. The action with the highest estimated value
is chosen.

Monte Carlo Tree Search (MCTS):

A common and more advanced form of Monte Carlo search is Monte Carlo Tree Search
(MCTS), which is widely used in game-playing AI, such as for Go and Chess. MCTS builds a
search tree by progressively refining the value estimates for different moves through four
key steps:

1. Selection: Starting from the root node (current game state), a path is selected
according to a certain strategy, often guided by the Upper Confidence Bound (UCB)
formula to balance exploration and exploitation.

2. Expansion: New nodes (representing game states) are added to the tree based on
possible moves from the selected node.

3. Simulation: Random simulations are run from the new node until a terminal state
(game end) is reached.

4. Backpropagation: The result of the simulation is used to update the values of the
nodes along the path that was selected, improving their estimates for future
decisions.

Example of Monte Carlo Search:

In a game like Go, where the number of possible moves is immense, it is impossible to
evaluate every possible sequence of moves like in chess. MCTS simulates thousands of
games, randomly playing out different scenarios from a given position. Based on which
simulations result in a win, the algorithm learns which moves are more likely to succeed
and prioritizes them.

Advantages:

1. Scalability: Monte Carlo search can handle large search spaces and complex
problems efficiently.
2. Anytime Algorithm: MCTS can be stopped at any time, providing a reasonably good
solution based on the simulations run so far.

3. Applicability to Uncertain Environments: It can be applied to domains with


uncertainty and incomplete information, such as games or real-world decision-
making problems.

Disadvantages:

1. Randomness: Since it relies on random sampling, the results may vary between
runs unless a very large number of simulations are performed.

2. High Computation: Requires many simulations to converge to a reliable estimate,


which can be computationally expensive for real-time applications.
Exam Date: Reg. No.:

VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN


Elayampalayam, Tiruchengode, Namakkal – 637 205.
Internal Assessment Test - II
Third Semester
Artificial Intelligence and Data Science
AL3391 - ARTIFICIAL INTELLIGENCE
(Regulation 2021)
Time: 3.00Hrs Maximum: 100 Marks
Answer ALL Questions
Part – A (10 x 2 = 20 marks)
1. What is Bayes rule? Mention its use. 2 CO5 L1
2. Why does uncertainty arise? 2 CO5 L1
3. State Unification in First order logic. 2 CO4 L1
4. What are three levels in describing knowledge based agent? 2 CO4 L1
5. What do you mean constraint propagation? 2 CO4 L2
6. Write the components of game. 2 CO3 L1
7. What is the purpose of relational probability models? 2 CO4 L1
8. How are Bayesian networks represented? 2 CO5 L1
9. Define the terms belief state and state estimation. 2 CO5 L1
10. Justify why we cannot use traditional min max for games
2 CO3 L2
with an element of chance, such as backgammon.
Part – B (5 x 16 = 80 marks)
11. a What is Conjunctive Normal Form? Illustrate and explain 16 CO4 L2
the procedure to convert sentences into Conjunctive
normal form with neat example.
(OR)
11.b Explain standard quantifiers of first order logic with 16 CO4 L1
example.

12. a What is Bayesian network? Explain the method for 16 CO5 L1


constructing Bayesian networks.
(OR)
12.b Explain in detail about causal network. 16 CO5 L1

13. a Explain about the Inference in Bayesian networks. 16 CO5 L1


(OR)
13.b How to represent knowledge in an uncertain domain? 16 CO5 L1

14. a Discuss the Knowledge Engineering Process with proper 16 CO4 L1


illustration. Depict the concept of forward chaining
(OR)
14.b Give the completeness of proof of resolution. 16 CO4 L3

15. a Explain the constraint Satisfaction problem with example. 16 CO3 L2


(OR)
15.b Explain local search in CSP. 16 CO3 L1

************

Blooms Taxonomy Level Marks in each Divisions Total % of


(BTL) PART A PART B Marks Distribution
Remember (L1) 18 112 130 72
Understanding (L2) 2 32 34 19
Apply (L3) - 16 16 9
Analyze (L4) - - - -
Total 20 160 180 100%
Exam Date: Reg. No.:

VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN


Elayampalayam, Tiruchengode, Namakkal – 637 205.
Internal Assessment Test - II
Third Semester
Artificial Intelligence and Data Science
AL3391 - ARTIFICIAL INTELLIGENCE
ANSWER KEY
(Regulation 2021)
Time: 3.00Hrs Maximum: 100 Marks
Answer ALL Questions
Part – A (10 x 2 = 20 marks)

1. What is Bayes' Rule? Mention its use.

 Bayes' Rule is a mathematical formula used to update the probability of a


hypothesis based on new evidence. It expresses the conditional probability of an
event AAA given evidence BBB:

P(A∣B)=P(B∣A)⋅P(A)P(B)P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)

o Where:
 P(A∣B)P(A | B)P(A∣B) is the posterior probability of AAA given BBB.
 P(B∣A)P(B | A)P(B∣A) is the likelihood of BBB given AAA.
 P(A)P(A)P(A) is the prior probability of AAA.
 P(B)P(B)P(B) is the evidence or marginal probability of BBB.
 Uses:
o Medical Diagnosis: Given symptoms, update the probability of a disease.
o Spam Filtering: Given certain keywords, determine the probability of an
email being spam.
o Machine Learning: Used in algorithms like Naive Bayes for classification
tasks.

2. Why does uncertainty arise?

 Uncertainty arises due to incomplete information, complexity, and ambiguity.


Some reasons include:
o Lack of information: Not having enough data to make precise predictions.
o Unpredictable environments: Situations where outcomes are not
deterministic (e.g., chance events, incomplete knowledge).
o Measurement error: Data may be noisy, inaccurate, or imprecise.
o Modeling limitations: Simplified models may not capture all real-world
complexities.

3. Why does uncertainty arise?

 The same explanation applies: uncertainty arises due to incomplete knowledge,


complexity of the environment, randomness, and limitations of our models.
4. What are the three levels in describing a knowledge-based agent?

 A knowledge-based agent operates at three levels:


1. Knowledge Level: Describes what the agent knows and how it can use that
knowledge to make decisions.
2. Symbolic Level: Deals with the representation of knowledge in the agent
(such as facts, rules, and logical expressions).
3. Computational Level: Focuses on the algorithms and mechanisms the agent
uses to process and reason with its knowledge.

5. What do you mean by constraint propagation?

 Constraint Propagation is a technique used in constraint satisfaction problems


(CSPs) to reduce the search space by enforcing constraints early. It works by
deducing values for variables that must hold true given other variable assignments
and constraints.
 Example: In a Sudoku puzzle, once you fill in a number, it reduces the possible
values for other cells (constraint propagation).

6. Write the components of a game.

 The components of a game include:


1. Players: The entities involved in the game, each with goals and strategies.
2. Actions: The possible moves a player can make.
3. States: The configurations or conditions of the game at any point in time.
4. Transitions: The rules defining how one state leads to another based on
actions.
5. Payoff: The reward or utility received by a player based on the outcome of the
game (often a numeric value).
6. Information: The knowledge each player has about the game state and other
players’ actions.

7. What is the purpose of relational probability models?

 Relational Probability Models (RPMs) are used to model complex relationships


between variables in a probabilistic framework. They are particularly useful when
there are multiple entities and interactions between them that must be captured in a
probabilistic model.
 Purpose: They help represent dependencies and uncertainty in structured domains,
like relational databases, where entities are related to each other (e.g., in social
networks, biological systems, or recommendation systems).

8. How are Bayesian networks represented?

 Bayesian Networks (BNs) are represented as directed acyclic graphs (DAGs)


where:
1. Nodes represent random variables.
2. Edges represent probabilistic dependencies between variables.
 Each node in a Bayesian network is associated with a conditional probability
distribution (CPD) that quantifies the relationship between the node and its parents
in the graph.

9. Define the terms belief state and state estimation.

 Belief State: A representation of the agent's knowledge about the current state of
the world. It often includes the probability distribution over all possible states based
on available information.
 State Estimation: The process of estimating the current state of the world based on
partial observations and prior knowledge. It may involve techniques like filtering,
smoothing, or using algorithms like the Kalman Filter to estimate the state from
noisy data.

10. Justify why we cannot use traditional minimax for games with an element of chance,
such as backgammon.

 Traditional Minimax assumes a deterministic game where the outcome of any


action is fully predictable. However, in games with an element of chance (e.g., dice
rolls in backgammon), the outcome of actions is probabilistic.
 In such games, we cannot simply assume that the best move leads directly to a
winning state; instead, we must account for uncertainty by considering the
probability of various outcomes. This leads to using expectiminimax (a variant of
minimax) that handles both decision-making and probabilistic events (like dice rolls)
in the game. It incorporates expectation (averaging over possible outcomes) for
moves involving chance.

Part – B (5 x 16 = 80 marks)

11.a. What is Conjunctive Normal Form? Illustrate and explain the procedure to
convert sentences into Conjunctive normal form with neat example.

Conjunctive Normal Form (CNF) is a standardized way of expressing logical formulas


in propositional logic, where the formula is expressed as a conjunction (AND) of clauses,
and each clause is a disjunction (OR) of literals. CNF is widely used in fields like computer
science, artificial intelligence, and logic programming because many algorithms for logic-
based problem-solving are designed to work with CNF.

Key Terms

 Literal: A literal is either an atomic proposition (like PPP, QQQ) or its negation
(¬P\neg P¬P, ¬Q\neg Q¬Q).
 Clause: A clause is a disjunction (OR) of literals, e.g., P∨¬QP \vee \neg QP∨¬Q.
 Conjunction of Clauses: CNF requires that a formula is a conjunction (AND) of
clauses, such as (P∨Q)∧(¬P∨R)(P \vee Q) \wedge (\neg P \vee R)(P∨Q)∧(¬P∨R).

Characteristics of CNF

A formula is in CNF if:

1. It is a conjunction (AND) of one or more clauses.


2. Each clause is a disjunction (OR) of literals.
3. No other logical operators (like implications or biconditionals) are used except
negations directly applied to variables.

Steps to Convert a Sentence to CNF

To convert a logical sentence into CNF, you typically follow these steps:

1. Eliminate Biconditionals and Implications:


o Replace biconditional A↔BA \leftrightarrow BA↔B with (A→B)∧(B→A)(A
\rightarrow B) \wedge (B \rightarrow A)(A→B)∧(B→A).
o Replace implication A→BA \rightarrow BA→B with ¬A∨B\neg A \vee B¬A∨B.
2. Move Negations Inward (using De Morgan's laws):
o Apply De Morgan's laws to move negations inward, converting ¬(A∧B)\neg(A
\wedge B)¬(A∧B) to ¬A∨¬B\neg A \vee \neg B¬A∨¬B, and ¬(A∨B)\neg(A \vee
B)¬(A∨B) to ¬A∧¬B\neg A \wedge \neg B¬A∧¬B.
o Eliminate double negations, so ¬(¬A)\neg(\neg A)¬(¬A) becomes AAA.
3. Distribute OR over AND:
o Use the distributive property to transform expressions into a conjunction of
disjunctions. For example, (A∧B)∨C(A \wedge B) \vee C(A∧B)∨C becomes
(A∨C)∧(B∨C)(A \vee C) \wedge (B \vee C)(A∨C)∧(B∨C).
4. Simplify (if needed):
o Remove any duplicate literals or clauses if they appear.

Example Conversion

Let’s convert the following expression to CNF:

(A→(B∨C))∧(¬B→A)(A \rightarrow (B \vee C)) \wedge (\neg B \rightarrow


A)(A→(B∨C))∧(¬B→A)

Step 1: Eliminate Implications

Rewrite implications using the rule A→B=¬A∨BA \rightarrow B = \neg A \vee BA→B=¬A∨B:

(¬A∨(B∨C))∧(B∨A)(\neg A \vee (B \vee C)) \wedge (B \vee A)(¬A∨(B∨C))∧(B∨A)

Simplify the inner expressions:

(¬A∨B∨C)∧(B∨A)(\neg A \vee B \vee C) \wedge (B \vee A)(¬A∨B∨C)∧(B∨A)

Step 2: Check for Distribution (if necessary)

Here, the expression is already in CNF because:

 It is a conjunction of two clauses.


 Each clause is a disjunction of literals.

Thus, the CNF form is:

(¬A∨B∨C)∧(B∨A)(\neg A \vee B \vee C) \wedge (B \vee A)(¬A∨B∨C)∧(B∨A)

Another Example with More Steps

Consider the formula:

((P→Q)∧(¬Q→R))((P \rightarrow Q) \wedge (\neg Q \rightarrow R))((P→Q)∧(¬Q→R))

Step 1: Eliminate Implications

Convert each implication to a disjunction:

(¬P∨Q)∧(Q∨R)(\neg P \vee Q) \wedge (Q \vee R)(¬P∨Q)∧(Q∨R)

Step 2: Distribute (if necessary)

No further distribution is needed here, as each part is already a disjunction.

Thus, the CNF form of this formula is:

(¬P∨Q)∧(Q∨R)(\neg P \vee Q) \wedge (Q \vee R)(¬P∨Q)∧(Q∨R)


Summary of Conversion Steps

1. Remove biconditionals and implications.


2. Apply De Morgan's laws to move negations inward.
3. Distribute OR over AND to create a conjunction of disjunctions.
4. Simplify to ensure the formula is in CNF.

This process ensures the logical expression is in a standardized CNF form suitable for
logical reasoning algorithms like resolution in propositional logic.

11b. Explain standard quantifiers of first order logic with example.

In First-Order Logic (FOL), quantifiers are symbols used to indicate the scope of a
statement over a domain of objects. They specify whether a statement applies universally to
all elements in the domain or only to some elements. The two standard quantifiers in FOL
are:

1. Universal Quantifier ( ∀ )
2. Existential Quantifier ( ∃ )

These quantifiers help formulate statements about properties or relationships within a


domain, allowing FOL to express a wide range of logical assertions.

1. Universal Quantifier ( ∀ )

The universal quantifier (denoted by ∀\forall∀) specifies that a statement is true for all
elements in the domain. The phrase “for all” or “for every” is often used to express
universal quantification.

 Symbol: ∀\forall∀
 Meaning: “For all” or “For every”
 Example in Mathematical Language: ∀x P(x)\forall x \, P(x)∀xP(x)
o Here, P(x)P(x)P(x) is a predicate, and the statement ∀x P(x)\forall x \,
P(x)∀xP(x) means that P(x)P(x)P(x) is true for every possible value of xxx in the
domain.

Example:

 Statement: “All humans are mortal.”


 FOL Expression: ∀x (Human(x)→Mortal(x))\forall x \, ( \text{Human}(x) \rightarrow
\text{Mortal}(x))∀x(Human(x)→Mortal(x))
o This reads as “For all xxx, if xxx is a human, then xxx is mortal.”
o This statement asserts that the property of being mortal applies universally to
all humans.

2. Existential Quantifier ( ∃ )

The existential quantifier (denoted by ∃\exists∃) specifies that there exists at least one
element in the domain for which a statement is true. The phrase “there exists” or “for
some” is used to express existential quantification.

 Symbol: ∃\exists∃
 Meaning: “There exists” or “For some”
 Example in Mathematical Language: ∃x P(x)\exists x \, P(x)∃xP(x)
o Here, P(x)P(x)P(x) is a predicate, and the statement ∃x P(x)\exists x \,
P(x)∃xP(x) means that there is at least one value of xxx in the domain for
which P(x)P(x)P(x) is true.

Example:

 Statement: “There exists a human who is an artist.”


 FOL Expression: ∃x (Human(x)∧Artist(x))\exists x \, (\text{Human}(x) \wedge
\text{Artist}(x))∃x(Human(x)∧Artist(x))
o This reads as “There exists an xxx such that xxx is a human and xxx is an
artist.”
o This statement asserts that there is at least one individual in the domain who
is both a human and an artist.

Combining Quantifiers

Quantifiers can be combined in a single statement to create more complex logical


expressions.

Example of Combination:

 Statement: “For every human, there exists a day they were born.”
 FOL Expression: ∀x (Human(x)→∃y BornOn(x,y))\forall x \, (\text{Human}(x)
\rightarrow \exists y \, \text{BornOn}(x, y))∀x(Human(x)→∃yBornOn(x,y))
o This reads as “For all xxx, if xxx is a human, then there exists some yyy such
that xxx was born on yyy.”
o This statement means that for every human, we can find a specific day of
birth, illustrating a relationship that involves both universal and existential
quantifiers.

 Universal Quantifier ( ∀ ): States that a property applies to all elements in a


domain.
 Existential Quantifier ( ∃ ): States that a property applies to at least one element in
a domain.

Quantifiers enable us to express general statements about objects and their properties in a
precise, logical way, making FOL a powerful tool in fields like mathematics, computer
science, and artificial intelligence.

12a. What is Bayesian network? Explain the method for constructing Bayesian
networks.

A Bayesian Network (BN) is a probabilistic graphical model that represents a set of


variables and their probabilistic dependencies using a directed acyclic graph (DAG). It
provides a compact way to encode joint probability distributions and supports reasoning
under uncertainty.

 Nodes in the graph represent random variables, which could be discrete or


continuous.
 Edges represent direct probabilistic dependencies between variables.
 Each node is associated with a conditional probability distribution (CPD) that
quantifies the effect of the parent nodes on the node.
Example:

A Bayesian network for a medical diagnosis might include variables like:

 DDD: Disease
 SSS: Symptoms
 TTT: Test results

The graph could show DDD influencing SSS and TTT, indicating that symptoms and test
results depend on the presence of a disease.

Properties of a Bayesian Network

1. Directed Acyclic Graph (DAG):


o No cycles in the graph.
o The direction of edges reflects the causal or probabilistic dependencies.
2. Joint Probability Representation:
o The joint probability distribution over the variables can be factored as:
P(X1,X2,…,Xn)=∏i=1nP(Xi ∣ Parents(Xi))P(X_1, X_2, \ldots, X_n) =
\prod_{i=1}^n P(X_i \,|\, \text{Parents}(X_i))P(X1,X2,…,Xn)=i=1∏nP(Xi
∣Parents(Xi)) where Parents(Xi)\text{Parents}(X_i)Parents(Xi) are the parent
nodes of XiX_iXi in the DAG.
3. Local Independence:
o A variable is conditionally independent of its non-descendants given its
parents.

Method for Constructing a Bayesian Network

The construction of a Bayesian Network involves the following steps:

1. Define the Variables

 Identify all relevant variables in the domain.


 Example: In a medical context, variables might include DDD (Disease), SSS
(Symptoms), and TTT (Test results).

2. Determine Dependencies

 Identify the direct probabilistic or causal relationships between variables.


 Represent these relationships as edges in a directed acyclic graph (DAG).
 Example: D→SD \rightarrow SD→S, D→TD \rightarrow TD→T.

3. Ensure Acyclic Structure

 Make sure the graph is acyclic, i.e., there are no loops. Modify the structure if
needed to eliminate cycles.

4. Quantify Relationships

 Assign a conditional probability distribution (CPD) to each node:


o For nodes without parents (root nodes), specify the marginal probabilities.
o For nodes with parents, specify the conditional probabilities.
 Example:
o P(D)=[0.1,0.9]P(D) = [0.1, 0.9]P(D)=[0.1,0.9] (10% chance of disease, 90% no
disease)
o P(S∣D)P(S | D)P(S∣D): Probability of symptoms given the disease status.

5. Validate the Network

 Verify that the structure and probabilities are consistent with the problem domain.
 Ensure the network can generate the desired joint probability distribution.

6. Perform Inference

 Use the Bayesian Network for reasoning tasks, such as calculating marginal
probabilities, updating beliefs, or making predictions.
 Example: Given observed symptoms (SSS), infer the likelihood of the disease (DDD).

Example of Constructing a Bayesian Network


Scenario:

You want to build a Bayesian Network for determining if a student will pass an exam based
on whether they study and their intelligence level.

Steps:

1. Variables:
o SSS: Study (Yes/No)
o III: Intelligence (High/Low)
o PPP: Pass Exam (Yes/No)
2. Dependencies:
o Intelligence (III) affects whether the student passes (PPP).
o Studying (SSS) also affects whether the student passes (PPP).
o III and SSS are independent of each other.
3. Graph:
o I→PI \rightarrow PI→P
o S→PS \rightarrow PS→P
4. CPDs:
o P(I)=[P(High Intelligence)=0.7,P(Low Intelligence)=0.3]P(I) = [P(\text{High
Intelligence}) = 0.7, P(\text{Low Intelligence}) =
0.3]P(I)=[P(High Intelligence)=0.7,P(Low Intelligence)=0.3]
o P(S)=[P(Study)=0.6,P(Not Study)=0.4]P(S) = [P(\text{Study}) = 0.6, P(\text{Not
Study}) = 0.4]P(S)=[P(Study)=0.6,P(Not Study)=0.4]
o P(P∣I,S)P(P | I, S)P(P∣I,S): Conditional probabilities for passing given
intelligence and study status.
5. Joint Probability:
o Using the chain rule: P(I,S,P)=P(I)⋅P(S)⋅P(P∣I,S)P(I, S, P) = P(I) \cdot P(S) \cdot
P(P | I, S)P(I,S,P)=P(I)⋅P(S)⋅P(P∣I,S)
6. Inference:
o Use the network to calculate the probability of passing given specific
conditions, like P(P∣S=Yes)P(P | S = \text{Yes})P(P∣S=Yes).

Applications of Bayesian Networks

 Medical Diagnosis: Determine the likelihood of diseases based on symptoms and


test results.
 Fault Diagnosis: Identify causes of system failures in engineering.
 Decision Making: Evaluate outcomes in uncertain environments (e.g., finance,
weather forecasting).
 Natural Language Processing: Resolve ambiguities in speech or text understanding.

By representing complex probabilistic relationships in a simple and intuitive structure,


Bayesian Networks are a powerful tool for reasoning under uncertainty.

12b. Explain in detail about causal network.

A causal network, also known as a causal graph or causal Bayesian network, is a type of
graphical model used to represent and reason about causal relationships between variables
in a system. It combines the principles of causality with the structure of a Bayesian
network, making it a powerful tool for understanding, explaining, and predicting the effects
of interventions or changes in a system.

 A causal network is a directed acyclic graph (DAG) where:


o Nodes represent variables (events, states, or attributes).
o Edges (arrows) represent direct causal relationships between the variables.

For example, if AAA causes BBB, there will be a directed edge from AAA to BBB.

Characteristics of a Causal Network

1. Causality Representation:
o The direction of an edge indicates the cause-and-effect relationship.
o For example, A→BA \rightarrow BA→B implies that changes in AAA cause
changes in BBB.
2. Markov Property:
o A variable in the network is conditionally independent of its non-descendants
given its parents. This simplifies reasoning about relationships.
3. Joint Probability Factorization:
o Similar to Bayesian networks, the joint probability of all variables can be
expressed as: P(X1,X2,…,Xn)=∏i=1nP(Xi ∣ Parents(Xi))P(X_1, X_2, \ldots, X_n)
= \prod_{i=1}^n P(X_i \,|\, \text{Parents}(X_i))P(X1,X2,…,Xn)=i=1∏nP(Xi
∣Parents(Xi))
4. Intervention Analysis:
o Causal networks allow for interventions to test "what if" scenarios. For
example, setting a variable AAA to a specific value and observing its effect on
other variables.

Components of a Causal Network

1. Nodes:
o Represent variables (discrete or continuous) in the system.
o Example: Smoking\text{Smoking}Smoking, Lung Cancer\text{Lung
Cancer}Lung Cancer, Coughing\text{Coughing}Coughing.
2. Directed Edges:
o Indicate causal influence. For instance, Smoking→Lung Cancer\text{Smoking}
\rightarrow \text{Lung Cancer}Smoking→Lung Cancer means smoking
directly affects lung cancer.
3. Conditional Probability Distributions (CPDs):
o Quantify the strength of causal relationships. Each node is associated with a
CPD that specifies the probability of that node's values given its parents.
Example of a Causal Network
Scenario:

A simplified model of a health system includes the following variables:

 AAA: Air pollution level


 SSS: Smoking habit
 LLL: Lung cancer
 CCC: Chronic cough

Causal Relationships:

 A→LA \rightarrow LA→L (Air pollution causes lung cancer).


 S→LS \rightarrow LS→L (Smoking causes lung cancer).
 L→CL \rightarrow CL→C (Lung cancer causes chronic cough).

Graph Representation:
css
Copy code
A S
\ /
\ /
L
|
C
Joint Probability:
P(A,S,L,C)=P(A)⋅P(S)⋅P(L ∣ A,S)⋅P(C ∣ L)P(A, S, L, C) = P(A) \cdot P(S) \cdot P(L \,|\, A, S) \cdot
P(C \,|\, L)P(A,S,L,C)=P(A)⋅P(S)⋅P(L∣A,S)⋅P(C∣L)

Method for Constructing a Causal Network

1. Identify Variables:
o Define all relevant variables in the system.
o Example: In a medical scenario, these could include symptoms, diseases, and
environmental factors.
2. Determine Causal Relationships:
o Use domain knowledge, data, or experiments to establish direct causal links
between variables.
o Example: Smoking causes lung cancer.
3. Create a Directed Acyclic Graph (DAG):
o Represent the variables as nodes and the causal relationships as directed
edges.
o Ensure there are no cycles.
4. Quantify Relationships:
o Assign a conditional probability distribution (CPD) to each node based on its
parents.
o Use historical data, expert knowledge, or statistical methods to estimate
probabilities.
5. Validate the Model:
o Ensure the network aligns with known causal relationships.
o Perform sensitivity analysis or compare predictions against observed data.
Applications of Causal Networks

1. Medical Diagnosis:
o Understanding causal relationships between symptoms, diseases, and
treatments.
o Example: Predicting how a treatment affects recovery.
2. Policy Decision-Making:
o Evaluating the effects of interventions, such as a new law or policy.
o Example: Studying the impact of reducing air pollution on public health.
3. Machine Learning and AI:
o Improving interpretability and robustness in decision-making systems.
o Example: Building explainable AI models.
4. Economics and Social Sciences:
o Analyzing the impact of variables like education, income, and policy changes
on economic outcomes.

Example: Causal Network for Education and Earnings


Scenario:

A researcher wants to model how education affects earnings, mediated by skills, and how
work experience also contributes.

Variables:

 EEE: Education
 SSS: Skills
 WWW: Work experience
 YYY: Earnings

Causal Relationships:

 E→SE \rightarrow SE→S: Education improves skills.


 S→YS \rightarrow YS→Y: Skills affect earnings.
 W→YW \rightarrow YW→Y: Work experience affects earnings.

Graph Representation:
markdown
Copy code
E W
\ /
\ /
S
|
Y
Joint Probability:
P(E,W,S,Y)=P(E)⋅P(W)⋅P(S ∣ E)⋅P(Y ∣ S,W)P(E, W, S, Y) = P(E) \cdot P(W) \cdot P(S \,|\, E)
\cdot P(Y \,|\, S, W)P(E,W,S,Y)=P(E)⋅P(W)⋅P(S∣E)⋅P(Y∣S,W)

This causal network helps in understanding how changes in education or work experience
influence earnings.

Advantages of Causal Networks

1. Intuitive Representation:
o Provides a clear graphical view of causal relationships.
2. Intervention Analysis:
o Enables testing "what if" scenarios by simulating interventions.
3. Efficient Inference:
o Allows for reasoning under uncertainty and predicting outcomes.
4. Insight into Dependencies:
o Helps identify independent and dependent variables for better decision-
making.

Causal networks are essential for reasoning in complex systems, providing a structured
way to understand cause-and-effect relationships and enabling data-driven decision-
making.

13a. Explain about the Inference in Bayesian networks.

Inference in a Bayesian network is the process of answering probabilistic queries


about the network. These queries might involve computing the probability of certain
variables given evidence, predicting outcomes, or diagnosing causes based on
observations. Inference enables reasoning under uncertainty, a key capability of
Bayesian networks.

Types of Inference

1. Marginal Inference:
o Calculates the marginal probability of a variable.
o Example: P(A)P(A)P(A), the probability of AAA, without any evidence.
2. Conditional Inference:
o Computes the probability of a variable given evidence about other
variables.
o Example: P(A ∣ B=b)P(A \,|\, B = b)P(A∣B=b), the probability of AAA
given B=bB = bB=b.
3. Most Probable Explanation (MPE):
o Finds the most likely values of all variables given evidence.
o Example: Given observed symptoms, find the most likely disease.
4. Maximum a Posteriori (MAP):
o Identifies the most likely values for a subset of variables given evidence.

Methods for Inference in Bayesian Networks

Inference in Bayesian networks can be exact or approximate, depending on the


complexity of the network and the size of the query.

1. Exact Inference

Exact inference methods compute probabilities precisely but may become


computationally expensive for large networks.
a. Enumeration

 Enumerates all possible values of variables in the network to calculate


probabilities.
 Directly applies the chain rule of probability.
 Example:
o To compute P(A ∣ B=b)P(A \,|\, B = b)P(A∣B=b), sum over all possible
configurations of other variables.

b. Variable Elimination

 A more efficient method than enumeration, using dynamic programming.


 Eliminates irrelevant variables by marginalizing them out iteratively.
 Steps:
1. Factorize the joint probability distribution.
2. Marginalize over variables not in the query or evidence.
3. Multiply resulting factors.

c. Message Passing (Belief Propagation)

 Used for trees or polytree structures (acyclic networks).


 Computes probabilities by passing messages between nodes.
 Types:
o Sum-Product Algorithm: For marginal probabilities.
o Max-Product Algorithm: For MAP or MPE queries.

d. Junction Tree Algorithm

 Converts the Bayesian network into a tree-like structure to simplify


computations.
 Useful for networks with loops.

2. Approximate Inference

For large or complex networks, exact methods become intractable, and approximate
methods are used.

a. Monte Carlo Sampling

 Generates random samples to estimate probabilities.


 Types:
o Rejection Sampling: Discards samples inconsistent with evidence.
o Likelihood Weighting: Assigns weights to samples based on evidence
likelihood.
o Gibbs Sampling: Iteratively samples each variable conditioned on the
current values of other variables.

b. Loopy Belief Propagation

 Extends belief propagation to networks with loops.


 Provides approximate results but is computationally efficient.
c. Variational Inference

 Approximates the true distribution with a simpler one and minimizes the
difference (e.g., using Kullback-Leibler divergence).

Example of Inference
Bayesian Network:

 Variables:
o CCC: Cloudy
o RRR: Rain
o SSS: Sprinkler
o WWW: Wet Grass
 Structure:
o C→RC \rightarrow RC→R, C→SC \rightarrow SC→S, R→WR
\rightarrow WR→W, S→WS \rightarrow WS→W

Query:

Compute P(R=true ∣ W=true)P(R = \text{true} \,|\, W = \text{true})P(R=true∣W=true),


the probability of rain given the grass is wet.

Exact Inference (Using Variable Elimination):

1. Start with the joint probability:

P(C,R,S,W)=P(C)P(R ∣ C)P(S ∣ C)P(W ∣ R,S)P(C, R, S, W) = P(C) P(R \,|\, C) P(S


\,|\, C) P(W \,|\, R, S)P(C,R,S,W)=P(C)P(R∣C)P(S∣C)P(W∣R,S)

2. Marginalize out CCC and SSS to get:

P(R,W)=∑C∑SP(C)P(R ∣ C)P(S ∣ C)P(W ∣ R,S)P(R, W) = \sum_C \sum_S P(C) P(R


\,|\, C) P(S \,|\, C) P(W \,|\, R, S)P(R,W)=C∑S∑P(C)P(R∣C)P(S∣C)P(W∣R,S)

3. Normalize over W=trueW = \text{true}W=true to get P(R ∣ W)P(R \,|\,


W)P(R∣W).

Approximate Inference (Using Gibbs Sampling):

1. Start with random values for C,R,S,WC, R, S, WC,R,S,W.


2. Iteratively sample each variable conditioned on the current values of others.
3. Use the samples to estimate P(R ∣ W=true)P(R \,|\, W =
\text{true})P(R∣W=true).

Applications of Bayesian Network Inference

1. Medical Diagnosis:
o Compute the probability of a disease given symptoms.
2. Fault Detection:
o Identify the most likely cause of a system failure based on observed
anomalies.
3. Decision Support Systems:
o Make predictions or recommendations under uncertainty.
4. Natural Language Processing:
o Resolve ambiguities in text or speech.

Challenges in Bayesian Network Inference

1. Complexity:
o Exact inference is NP-hard for general Bayesian networks.
2. Scalability:
o Large networks with many variables require approximate methods.
3. Data Sparsity:
o Incomplete data can make accurate inference challenging.

Inference in Bayesian networks is fundamental for probabilistic reasoning and


decision-making, enabling robust predictions and insights across domains like AI,
medicine, and engineering.

13b. How to represent knowledge in an uncertain domain?

1. Probability Theory

Probability theory is one of the most widely used frameworks for representing uncertainty.
It assigns probabilities to events or propositions, reflecting the likelihood of their
occurrence.

 Bayesian Networks: These are directed acyclic graphs where nodes represent
random variables and edges represent probabilistic dependencies. Bayesian
networks allow for inference, diagnostics, and prediction in uncertain environments.

Example: In a medical diagnosis system, the probability of having a disease given


certain symptoms can be represented using a Bayesian network.

 Markov Networks: These are undirected graphical models that represent joint
probabilities without directed edges, useful for domains with symmetrical
relationships.
 Hidden Markov Models (HMMs): Used for time-series data, HMMs represent
systems that evolve over time with states that are not directly observable but can be
inferred through observable data.

2. Fuzzy Logic

Fuzzy logic extends traditional Boolean logic to handle partial truth values between 0 and
1. It is particularly useful for reasoning in domains where information is vague or
imprecise.

 Membership Functions: Variables in fuzzy logic are represented by membership


functions, which indicate the degree of truth of a statement (e.g., "temperature is
high" could be 0.8 true instead of fully true or false).
 Fuzzy Inference Systems: These systems use fuzzy rules to draw conclusions,
where the truth values are partial and allow for a continuum of truth.

Example: In control systems, such as air conditioning, fuzzy logic can interpret
ambiguous terms like "slightly warm" or "very hot" and make decisions accordingly.

3. Belief Networks (Dempster-Shafer Theory)

Dempster-Shafer theory is used to manage uncertainty when evidence supports multiple


possibilities rather than precise probabilities. It assigns a degree of belief to each
proposition, allowing for reasoning under incomplete knowledge.

 Belief Functions: Belief functions allow for expressing degrees of belief for
propositions based on evidence.
 Dempster’s Rule of Combination: Used to combine evidence from multiple sources
and update beliefs.

Example: In a fault diagnosis system, multiple sensors might give conflicting or


partial information about a failure, which can be combined using Dempster-Shafer
theory to produce an overall belief about the cause.

4. Certainty Factors

Certainty factors (CFs) are used to represent the strength of belief or disbelief in a
hypothesis given evidence. They are common in expert systems and provide a measure
between -1 (complete disbelief) and +1 (complete belief).

 CF Rule-Based Systems: Certainty factors are often used with rule-based systems,
where each rule is associated with a certainty factor.

Example: In an expert system for medical diagnosis, if a doctor is 70% confident


that certain symptoms indicate a disease, this confidence can be represented as a
certainty factor.

5. Rule-Based Systems with Heuristics

In uncertain domains, rules can be created with heuristics, providing approximate or


empirical knowledge to guide reasoning when data is ambiguous or incomplete.

 Forward Chaining and Backward Chaining: These are rule-based reasoning


methods that allow systems to infer conclusions or explain evidence even when data
is not fully certain.

Example: A troubleshooting system for machinery may use rules like “If the engine
won’t start, check the fuel level” without a strict probabilistic model, relying instead
on empirical rules and heuristics.
6. Qualitative Reasoning

Qualitative reasoning provides descriptions of how systems behave in terms of relative


changes and relationships, useful when quantitative data is sparse or unavailable.

 Qualitative Models: Represent relationships in terms of direction (e.g., increase,


decrease) rather than exact quantities.

Example: A weather forecasting system may reason that an increase in cloud cover
often leads to cooler temperatures without specifying exact measurements.

7. Non-Monotonic Logic

In domains where knowledge is incomplete or can change over time, non-monotonic logic
allows systems to retract conclusions when new information contradicts them.

 Default Logic and Circumscription: Used to make assumptions in the absence of


complete information, but allows these assumptions to be revised as new evidence
emerges.

Example: A legal expert system may assume a suspect is innocent (by default) until
evidence suggests otherwise, updating its conclusions with new data.

8. Neural Networks and Machine Learning Models

Machine learning models, especially neural networks, can model uncertainty by learning
patterns from large datasets. They are capable of capturing complex relationships that may
not be easily expressed through probabilistic or rule-based models.

 Probabilistic Neural Networks: Provide probabilistic outputs, representing


uncertainty in predictions.
 Bayesian Neural Networks: Incorporate uncertainty into model parameters, useful
for domains where prediction confidence is important.

Example: Image recognition systems may output probabilities for multiple


classifications, capturing the uncertainty of each prediction.

 Bayesian Networks, Markov Models, and Probability Theory are best for domains
with known probabilistic relationships.
 Fuzzy Logic works well with vague, imprecise information.
 Dempster-Shafer Theory and Certainty Factors allow belief representation in
cases of partial or conflicting evidence.
 Rule-Based Systems with Heuristics are useful for empirical and expert-driven
reasoning.
 Qualitative Reasoning provides relational knowledge when exact data is sparse.
 Non-Monotonic Logic is ideal for evolving knowledge bases.
 Machine Learning Models capture complex patterns and can quantify uncertainty
in predictions.

Choosing the right method depends on the nature of the domain, the type of uncertainty
involved, and the reasoning tasks required. Often, combining several approaches is
necessary to address different aspects of uncertainty effectively.
14a. Discuss the Knowledge Engineering Process with proper illustration. Depict the
concept of forward chaining.

Knowledge Engineering Process

The Knowledge Engineering Process involves systematically building a knowledge-based


system to solve complex problems in specific domains by representing, organizing, and
processing knowledge. This process is crucial in developing expert systems, decision
support systems, and other intelligent applications. It includes several structured stages
that ensure accurate knowledge capture, representation, and application.

Stages of the Knowledge Engineering Process

1. Problem Definition and Knowledge Acquisition:


o Define the domain, scope, and objectives of the system.
o Identify the domain experts and gather knowledge from them through
interviews, questionnaires, observations, and document analysis.
o Example: In a medical diagnosis system, domain experts (doctors) provide
knowledge about diseases, symptoms, and treatments.
2. Knowledge Representation:
o Organize the acquired knowledge into a format suitable for reasoning.
o Common representations include rules, frames, semantic networks,
decision trees, and Bayesian networks.
o Example: A rule-based approach could be used to represent medical
knowledge, where a rule might state: “If the patient has a fever and cough,
then consider flu.”
3. Knowledge Encoding:
o Translate knowledge into a machine-readable format using programming
languages or knowledge representation languages (e.g., Prolog, Lisp).
o The encoded knowledge should allow efficient access, inference, and updating.
o Example: Encoding the medical rules into a Prolog database, where
conditions and conclusions are represented in logical statements.
4. Knowledge Validation and Testing:
o Ensure that the system accurately represents expert knowledge and performs
correctly on real or simulated cases.
o Domain experts review and validate the system’s outputs to ensure they align
with expected results.
o Example: Testing the medical diagnosis system on patient case studies to
verify that it provides correct diagnoses based on symptoms.
5. Knowledge Refinement and Maintenance:
o Continuously update the knowledge base to reflect changes in the domain.
o Maintenance includes adding new rules, removing outdated information, and
refining existing knowledge based on user feedback.
o Example: Updating the medical system with new treatments or disease
information as medical research advances.
6. Deployment and Evaluation:
o Deploy the system in a real-world environment, monitor its performance, and
gather feedback from users.
o Evaluate its impact, usability, and effectiveness, and make adjustments as
needed.
Illustration of Knowledge Engineering Process

Imagine we are developing an Agricultural Expert System to diagnose plant diseases.


Here’s how the knowledge engineering process might look:

1. Problem Definition: Define the system's goal as diagnosing plant diseases based on
symptoms.
2. Knowledge Acquisition: Interview agricultural experts to gather knowledge on
various plant diseases and their symptoms.
3. Knowledge Representation: Use a rule-based representation, where each disease is
associated with a set of symptoms.
4. Knowledge Encoding: Encode these rules in a system such as Prolog or a decision
tree model.
5. Knowledge Validation: Test the system with known cases to ensure accurate
diagnosis.
6. Knowledge Maintenance: Update the system with new diseases or symptoms based
on expert input or user feedback.

Forward Chaining

Forward chaining is a reasoning method used in rule-based systems where inference


starts with known facts and applies rules to derive new facts or conclusions. It is a data-
driven approach, beginning with available information and progressing until a goal or
conclusion is reached.

1. Starting Point: Begin with known facts or data.


2. Rule Matching: Identify rules whose conditions match the known facts.
3. Rule Application: Apply these rules, adding the conclusions of each rule to the
knowledge base as new facts.
4. Repeat: Continue the process, applying more rules until no more rules can be
applied or a specific goal is achieved.

Example of Forward Chaining

Consider an expert system for plant disease diagnosis. Assume the following rules:

 Rule 1: If leaves are yellow and wilting, then consider nutrient deficiency.
 Rule 2: If nutrient deficiency and soil is dry, then consider irrigation need.
 Rule 3: If leaves have spots and temperature is high, then consider fungal infection.

Let’s say we have the following initial fact:

 "Leaves are yellow and wilting."

Forward Chaining Process:

1. Match Rule 1: Since "leaves are yellow and wilting," apply Rule 1 to conclude
"nutrient deficiency."
2. New Fact Added: The fact "nutrient deficiency" is added to the knowledge base.
3. Match Rule 2: If the system knows "soil is dry," Rule 2 would conclude "irrigation
need."
4. Goal Reached: Forward chaining continues until no more rules apply or the system
reaches a target conclusion.
Illustration of Forward Chaining in Flowchart

Here's a simplified flowchart of forward chaining:

sql
Copy code
Start
|
Known Facts
|
-------------------------
| |
Rule Matching No rules apply
or apply next --> --> End
applicable rule |
|
Update Knowledge
Base
|
V
New Facts / Conclusions
Applications of Forward Chaining

1. Medical Diagnosis: Forward chaining can help determine diseases by starting with
symptoms and applying diagnostic rules to narrow down the possible conditions.
2. Troubleshooting Systems: For example, in an automotive diagnosis system,
starting from observed problems, forward chaining rules can help find probable
faults.
3. Agricultural Expert Systems: Diagnose plant issues by observing symptoms and
applying rules to reach a diagnosis.

The knowledge engineering process is a structured approach to developing knowledge-


based systems, ensuring they accurately represent domain expertise and provide reliable
inference. Forward chaining is a data-driven reasoning technique, valuable in domains
where conclusions need to be derived from known facts and rules

14b. Give the completeness of proof of resolution

Completeness of Proof of Resolution

The completeness of resolution refers to the ability of the resolution method to prove any
logical entailment in propositional or first-order logic. Specifically, if a set of clauses is
unsatisfiable (i.e., there is no interpretation that makes all the clauses true), the resolution
method will eventually derive a contradiction (the empty clause, denoted as ⊥\bot⊥).

Resolution Method

The resolution method is a rule of inference that operates on clauses in Conjunctive


Normal Form (CNF). It combines pairs of clauses with complementary literals to produce a
new clause, iteratively simplifying the knowledge base.

Resolution Rule: If we have two clauses:

(A∨C1)and(¬A∨C2),(A \lor C_1) \quad \text{and} \quad (\neg A \lor C_2),(A∨C1)and(¬A∨C2


),

we can resolve them to produce:


C1∨C2.C_1 \lor C_2.C1∨C2.

Where AAA and ¬A\neg A¬A are complementary literals.

Completeness of Resolution: Key Concepts

1. Soundness:
o The resolution method is sound, meaning that any clause derived by
resolution is logically implied by the original set of clauses.
o This ensures that the resolution method does not produce incorrect results.
2. Refutational Completeness:
o The resolution method is refutationally complete. This means:
 If the set of clauses is unsatisfiable, the resolution process will
eventually derive the empty clause (⊥\bot⊥).
 If the set of clauses is satisfiable, no contradiction will be derived.

Proof of Completeness

The completeness proof relies on two main properties of resolution:

1. Ground Case (Propositional Logic Completeness)

 In propositional logic, every formula can be reduced to a finite set of clauses in CNF.
 The resolution rule ensures that if two clauses contain complementary literals, a new
clause can be derived, progressively simplifying the problem.
 Given the compactness theorem of propositional logic, if a set of clauses is
unsatisfiable, the resolution method will eventually derive the empty clause (⊥\bot⊥).

Example (Propositional Case):

Consider the set of clauses:

1. A∨BA \lor BA∨B


2. ¬A∨C\neg A \lor C¬A∨C
3. ¬B∨¬C\neg B \lor \neg C¬B∨¬C

Resolution Steps:

1. Resolve clauses 111 and 222: B∨C.B \lor C.B∨C.


2. Resolve 333 (¬B∨¬C\neg B \lor \neg C¬B∨¬C) and B∨CB \lor CB∨C: ⊥.\bot.⊥.

This shows the set is unsatisfiable.

2. First-Order Logic Completeness

In first-order logic, the completeness proof is more complex because of the need to account
for infinite domains. The resolution method extends to first-order logic by:

 Converting Formulas to Clausal Form:


o Convert the first-order logic formula into prenex form and then into CNF.
o Skolemize existential quantifiers to eliminate them.
 Ground Instances:
o First-order resolution operates on ground instances of clauses, meaning
clauses with all variables substituted by constants from the domain.
o The Herbrand universe (set of all possible ground terms) guarantees that any
first-order formula has a finite representation for any given query.
 Lifting Lemma:
o Completeness in first-order logic relies on the lifting lemma, which states
that if a refutation exists for the ground instances of the clauses, a refutation
exists for the original clauses.

Key Results:

1. Herbrand’s Theorem: Any unsatisfiable set of first-order clauses has a finite subset
of ground instances that is also unsatisfiable.
2. Resolution Completeness: If a set of first-order clauses is unsatisfiable, the
resolution method will derive the empty clause.

Steps to Ensure Completeness in Practice

1. Clause Normalization:
o Convert all formulas into CNF.
o Ensure no syntactic errors in transformation (e.g., avoid losing information
during Skolemization).
2. Fairness of Selection:
o During resolution, ensure all possible pairs of clauses are considered. This
guarantees that no valid derivation is missed.
3. Termination:
o For finite domains, resolution terminates when either the empty clause is
derived (proving unsatisfiability) or no more resolutions are possible
(indicating satisfiability).
o For infinite domains, heuristics like breadth-first search ensure progress
toward refutation.

Limitations of Completeness

 Computational Complexity:
o While complete, the resolution method may be computationally expensive,
especially for large or infinite domains.
o Finding a resolution refutation can involve exploring an exponential number
of possible clause pairs.
 Decidability:
o While propositional logic is decidable, first-order logic is only semi-decidable.
If a set of first-order clauses is satisfiable, the resolution method may run
indefinitely without finding a refutation.

 The resolution method is refutationally complete, meaning it can derive the


empty clause for any unsatisfiable set of clauses in propositional or first-order logic.
 Proof of completeness involves demonstrating that all contradictions in a set of
clauses can be systematically derived using the resolution rule.
 While theoretically complete, practical implementation must address computational
limitations through heuristics and optimization techniques.
15a. Explain the constraint Satisfaction problem with example.

Constraint Satisfaction Problem (CSP)

A Constraint Satisfaction Problem (CSP) is a mathematical problem defined as a set of


objects whose state must satisfy a number of constraints or limitations. CSPs are
commonly used in Artificial Intelligence (AI) for solving problems like scheduling, planning,
and resource allocation.

Components of a CSP

1. Variables:
o The set of variables to be assigned values.
o Denoted as X={X1,X2,…,Xn}X = \{X_1, X_2, \ldots, X_n\}X={X1,X2,…,Xn}.
2. Domains:
o Each variable has a domain of possible values.
o Denoted as D={D1,D2,…,Dn}D = \{D_1, D_2, \ldots, D_n\}D={D1,D2,…,Dn},
where DiD_iDi is the domain of XiX_iXi.
3. Constraints:
o Rules that restrict the values variables can simultaneously take.
o Constraints can involve one or more variables and are expressed as relations,
such as:
 Unary constraint: Constraints on a single variable (X1≠2X_1 \neq 2X1
=2).
 Binary constraint: Constraints between two variables (X1≠X2X_1 \neq
X_2X1 =X2).
 Global constraint: Constraints involving multiple variables (e.g., all
variables must have distinct values).

Example of a CSP: Map Coloring Problem


Problem Statement:

Color a map of regions such that:

1. Adjacent regions must have different colors.


2. Available colors are Red (R), Green (G), and Blue (B).

Representation:

1. Variables:
o X={A,B,C,D,E}X = \{A, B, C, D, E\}X={A,B,C,D,E}, where A,B,C,D,EA, B, C, D,
EA,B,C,D,E are regions of the map.
2. Domains:
o DA=DB=DC=DD=DE={R,G,B}D_A = D_B = D_C = D_D = D_E = \{R, G, B\}DA
=DB=DC=DD=DE={R,G,B}.
3. Constraints:
o A≠BA \neq BA =B, A≠CA \neq CA =C, B≠DB \neq DB =D, C≠DC \neq
DC =D, C≠EC \neq EC =E, D≠ED \neq ED =E.

Solving the CSP


Steps to Solve a CSP

1. Constraint Graph Representation:


o Represent the problem as a graph where:
 Nodes represent variables.
 Edges represent constraints.
o For the map coloring problem, the constraint graph is:

mathematica
Copy code
A -- B
| |
C -- D -- E

2. Search for a Solution:


o Assign values to variables such that constraints are satisfied.

Solution Techniques

1. Backtracking Search:
o Assign values to variables one at a time and backtrack when a constraint is
violated.
o Example:
 Assign A=RA = RA=R.
 Assign B=GB = GB=G (satisfies A≠BA \neq BA =B).
 Assign C=BC = BC=B (satisfies A≠CA \neq CA =C).
 Continue until all constraints are satisfied.
2. Constraint Propagation:
o Use techniques like arc-consistency (AC-3) to reduce the domain of variables
by eliminating values that violate constraints.
3. Heuristics:
o Use heuristics to optimize the search process:
 Most Constrained Variable (MCV): Assign values to the variable with
the fewest legal values first.
 Least Constraining Value (LCV): Choose a value that least restricts
the domains of other variables.

Example Solution

For the map coloring problem:

1. Assign A=RA = RA=R.


2. Assign B=GB = GB=G (A≠BA \neq BA =B).
3. Assign C=BC = BC=B (A≠CA \neq CA =C, C=BC = BC=B).
4. Assign D=RD = RD=R (B≠DB \neq DB =D, C≠DC \neq DC =D).
5. Assign E=GE = GE=G (D≠ED \neq ED =E, C≠EC \neq EC =E).

Solution:

A=R,B=G,C=B,D=R,E=GA = R, B = G, C = B, D = R, E = GA=R,B=G,C=B,D=R,E=G

Applications of CSPs

1. Scheduling:
o Allocating time slots for tasks or resources while meeting constraints.
o Example: Exam timetabling, where no two exams in the same room occur
simultaneously.
2. Planning:
o Determining actions to achieve a goal under constraints.
o Example: Planning a delivery route with time and resource limitations.
3. Puzzles and Games:
o Solving problems like Sudoku, crossword puzzles, or n-queens.
4. Resource Allocation:
o Assigning limited resources to tasks under constraints.
o Example: Allocating workers to shifts while meeting availability and skill
constraints.

A Constraint Satisfaction Problem (CSP) consists of variables, domains, and constraints.


It is solved using techniques like backtracking, constraint propagation, and heuristics.
CSPs are widely used in real-world applications requiring intelligent decision-making under
constraints. The map coloring problem is a classic example illustrating CSP concepts.

15b. Explain local search in CSP

Local Search in Constraint Satisfaction Problems (CSPs)

Local search is an optimization approach used to solve Constraint Satisfaction Problems


(CSPs) by iteratively exploring the search space to find a solution. Unlike systematic
methods like backtracking, which explore all possibilities exhaustively, local search focuses
on improving an initial solution through small modifications.

Local search is particularly effective for large CSPs where systematic search methods are
infeasible due to time or memory constraints.

Key Concepts of Local Search

1. Complete vs. Partial Assignments:


o Local search typically works with complete assignments, where all variables
have values, even if they violate constraints.
o The objective is to reduce the number of violated constraints by iteratively
improving the current assignment.
2. Objective Function:
o The objective is to minimize the number of constraints violated.
o For a feasible solution, the number of violated constraints is zero.
3. Neighborhood:
o The neighborhood of a solution consists of all assignments that can be
reached by changing the value of one or more variables.
4. Hill-Climbing and Variants:
o Start with an initial assignment and iteratively move to a neighboring
assignment that improves the objective function.
o May get stuck in local optima, so strategies like random restarts or simulated
annealing are used to escape.

Steps of Local Search in CSP

1. Initialization:
o Start with a random or heuristic-based complete assignment of values to
variables.
2. Evaluation:
o Calculate the objective function (number of violated constraints).
3. Neighbor Selection:
o Select a neighboring solution by modifying the value of one or more variables.
o Choose the modification that reduces the number of violated constraints.
4. Iteration:
Replace the current solution with the selected neighbor.
o
Repeat until the solution satisfies all constraints or a stopping condition is
o
met.
5. Escape Strategies (if stuck):
o Random Restart: Restart the search from a new random solution.
o Simulated Annealing: Accept worse solutions with some probability to escape
local optima.
o Tabu Search: Keep track of recently visited solutions to avoid cycles.

Example: Local Search in a CSP


Problem: Map Coloring

We need to color a map such that no two adjacent regions share the same color. Variables
represent regions, and domains are the colors available (Red, Green, Blue).

Steps:

1. Initialization: Assign random colors to all regions:

A=R,B=R,C=G,D=G,E=BA = R, B = R, C = G, D = G, E = BA=R,B=R,C=G,D=G,E=B

2. Evaluation: Count the number of constraints violated:


o A≠BA \neq BA =B is violated (A=R,B=RA = R, B = RA=R,B=R).
o Total violations: 111.
3. Neighbor Selection: Change the color of BBB to GGG:

A=R,B=G,C=G,D=G,E=BA = R, B = G, C = G, D = G, E = BA=R,B=G,C=G,D=G,E=B

4. Re-evaluation: New violations:


o B≠CB \neq CB =C is violated (B=G,C=GB = G, C = GB=G,C=G).
o Total violations: 111.
5. Iteration: Continue selecting neighbors that reduce violations. Eventually, a
solution is found where no constraints are violated:

A=R,B=G,C=B,D=R,E=GA = R, B = G, C = B, D = R, E = GA=R,B=G,C=B,D=R,E=G

Advantages of Local Search

1. Efficiency:
o Scales well to large CSPs since it avoids exhaustive exploration of the search
space.
2. Simplicity:
o Easy to implement and can be applied to a wide range of problems.
3. Memory Usage:
o Requires less memory compared to systematic methods like backtracking.
4. Heuristics:
o Can incorporate domain-specific heuristics for faster convergence.

Disadvantages of Local Search

1. Local Optima:
o May get stuck in a local optimum without reaching a global solution.
2. Incomplete:
o Does not guarantee finding a solution even if one exists.
3. Parameter Sensitivity:
o Performance depends on parameters like the number of restarts or cooling
schedules in simulated annealing.

Applications of Local Search in CSP

1. Timetabling:
o Assigning timeslots to exams or classes while satisfying constraints.
2. Sudoku and Puzzles:
o Solving combinatorial puzzles with constraints.
3. Job Scheduling:
o Allocating tasks to workers or machines while meeting deadlines and
dependencies.
4. Resource Allocation:
o Assigning resources to tasks in constrained environments.

Local search for CSPs is an optimization technique that focuses on iteratively improving a
complete assignment of values to variables by reducing the number of constraint violations.
While efficient and memory-friendly, it may struggle with local optima and incompleteness,
making escape strategies essential for robust performance. It is particularly effective for
large-scale or complex CSPs where systematic methods become computationally
prohibitive.
VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN

Department of Artificial Intelligence and Data Science

ASSIGNMENT-1

Year/Sem: II/ III

AL3391 – ARTIFICIAL INTELLIGENCE [R-2021]


Date of submission: 07.10.2024 Max. Marks: 40

Answer all the questions


Q.
Question Marks CO BTL
No.
1. Explain about different kinds of Intelligent 8 CO1 RE
Agents with neat diagram.

2. What is Informed search techniques? Explain 8 CO1 RE


any one with example.

3. Describe local search algorithms. 8 CO2 RE

4. Write about AO* algorithm with example. 8 CO2 RE

5. Write short notes on Monte-Carlo search. 8 CO3 RE

Course Coordinator HoD


VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN

Department of Artificial Intelligence and Data Science

ASSIGNMENT-1
ANSWER KEY

Year/Sem: II/ III

AL3391 – ARTIFICIAL INTELLIGENCE [R-2021]


Date of submission: 07.10.2024 Max. Marks: 40

1. Different Kinds of Intelligent Agents


Intelligent agents are autonomous entities that observe and act upon an environment
to achieve certain goals. These agents can be categorized based on their capabilities and
complexity:
 Simple Reflex Agents: These agents respond directly to percepts (current
situations). They select actions on the basis of the current percept, ignoring the
rest of the percept history. This approach works well for environments where the
correct decision is always based on the current state, but can fail in more complex
scenarios.
Example: A thermostat that turns on the heater when the temperature drops
below a certain point.
 Model-based Reflex Agents: These agents maintain an internal state to keep
track of the world that cannot be seen directly. They update the state using
information about how the world evolves and how their actions affect the
environment.
Example: A robot vacuum that remembers the map of the house and cleans
efficiently based on this knowledge.
 Goal-based Agents: In addition to the current state, these agents have a goal that
describes desirable situations. They act to achieve their goals by considering
future consequences of actions. These agents are more flexible but require more
computation.
Example: A GPS navigation system that plans a route based on the goal of
reaching a destination.
 Utility-based Agents: These agents use a utility function to measure the
goodness of different states. They aim not just to achieve goals but to maximize
overall performance by selecting actions that offer the best trade-offs.
Example: An autonomous car that chooses routes based on fuel efficiency and
time.
 Learning Agents: Learning agents can improve their performance over time by
learning from their experiences. They are capable of adapting to changes in the
environment and can optimize their actions over time.
Example: A self-driving car that improves its navigation over time through
experience.
2. Informed Search Techniques
Informed search techniques use problem-specific knowledge to find solutions more
efficiently than uninformed searches. They incorporate heuristics to guide the search.
 A Search Algorithm (Example):* A* search is one of the most popular informed
search algorithms, combining features of both uniform-cost search and greedy
best-first search. It finds the least-cost path to the goal by using a heuristic
function h(n)h(n)h(n), which estimates the cost to reach the goal from node nnn,
and a cost function g(n)g(n)g(n), which gives the exact cost to reach nnn from the
start node.
The algorithm explores paths that appear to be most promising based on
f(n)=g(n)+h(n)f(n) = g(n) + h(n)f(n)=g(n)+h(n), where f(n)f(n)f(n) is the total estimated
cost of the solution path going through node nnn.
Example: In a navigation system, A* could calculate the shortest path to a
destination by estimating the remaining distance (as a heuristic) and summing it
with the actual distance traveled so far.

3. Local Search Algorithms


Local search algorithms operate by iteratively improving a single solution until an
optimal or satisfactory solution is found. Unlike other search algorithms that explore the
entire search space, local search focuses on exploring the immediate neighbors of the
current solution.
 Hill Climbing: This algorithm continually moves towards the neighbor with a
better score (in the case of maximization) or a lower score (in minimization). It’s
simple but can get stuck in local optima.
 Simulated Annealing: This algorithm introduces randomness to escape local
optima by allowing downhill moves with a probability that decreases over time,
much like cooling metal slowly to avoid defects.
 Genetic Algorithms: Inspired by natural selection, these algorithms maintain a
population of solutions, evolving them over generations by selecting the best,
recombining, and mutating them.
Example of Local Search: For the Traveling Salesman Problem (TSP), a local
search algorithm might start with a random tour and then try to improve it by
swapping two cities' positions, thereby finding shorter tours iteratively.

4. AO Algorithm*
The AO* algorithm is used for solving problems that can be represented as an AND-
OR graph. It works by searching the graph in a best-first manner and is suited for
problems that involve decision-making where the outcome is dependent on multiple sub-
goals (AND) or where one of several sub-goals needs to be achieved (OR).
 Steps in AO Algorithm:*
1. Start at the root node.
2. Select the best node based on the heuristic function.
3. If the node is an OR node, expand it by selecting one of its children.
4. If the node is an AND node, expand all its children.
5. Continue expanding until a solution is found or no further nodes can be
expanded.
Example: AO* is used in game-playing and decision-tree problems, where
different strategies are explored, and combinations of outcomes (ANDs) or
alternatives (ORs) are considered.
5. Monte Carlo Search
Monte Carlo search is a technique used to make decisions in uncertain
environments, particularly in game-playing AI. It involves running many random
simulations from the current position to estimate the potential outcomes of different
actions.
 Monte Carlo Tree Search (MCTS): This is a popular variant, where the search is
structured as a tree, and each node represents a game state. Simulations are run
by playing random moves, and the tree is updated based on the results. Over
time, better actions are explored more frequently.
Example: MCTS is widely used in AI systems for board games like Go and Chess,
where it helps the AI select moves that have the highest likelihood of success
based on multiple simulated outcomes.
VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN

Department of Artificial Intelligence and Data Science

ASSIGNMENT-2

Year/Sem: II/ III

AL3391 – ARTIFICIAL INTELLIGENCE [R-2021]


Date of submission: .20.11.2024 Max. Marks: 40

Answer all the questions


Q.
Question Marks CO BTL
No.
1. Discuss about Backtracking in CSP 8 CO3 RE

2. Give the five logical connectives used to 8 CO4 RE


construct complex sentences and give the
formal grammar of propositional logic.

3. Give the completeness of proof of resolution 8 CO4 RE

4. Explain the method of performing exact 8 CO5 RE


inference.

5. How to represent knowledge in an uncertain 8 CO5 RE


domain?

Course Coordinator HoD


VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN

Department of Artificial Intelligence and Data Science

ASSIGNMENT-2
ANSWER KEY

Year/Sem: II/ III

AL3391 – ARTIFICIAL INTELLIGENCE [R-2021]


Date of submission: 20.11.2024 Max. Marks: 40

1. Backtracking in CSP (Constraint Satisfaction Problems)


Backtracking is a search algorithm used to solve Constraint Satisfaction Problems
(CSPs), where the goal is to find values for a set of variables that satisfy certain
constraints. Here's a brief overview of how it works:
 Initial Setup: You have a set of variables, each with a domain of possible values,
and a set of constraints that describe legal combinations of values.
 Search Process:
1. Variable Assignment: Assign a value to a variable from its domain.
2. Constraint Checking: Ensure that the current assignment satisfies all constraints
(e.g., no two variables with the same color in a graph coloring problem).
3. Backtracking: If a violation of constraints is detected, backtrack by undoing the last
assignment and trying another value.
4. Continue: Repeat the process until a solution is found or all possibilities are
exhausted.
Backtracking can be optimized using techniques like forward checking (checking the
effects of assignments on other variables immediately) or constraint propagation
(eliminating infeasible values earlier).

2. Five Logical Connectives and Formal Grammar of Propositional Logic


In propositional logic, five logical connectives are commonly used to construct complex
sentences:
1. Negation (¬): Reverses the truth value of a statement.
o Example: ¬P (not P)
2. Conjunction (∧): True if both operands are true.
o Example: P ∧ Q (P and Q)
3. Disjunction (∨): True if at least one operand is true.
o Example: P ∨ Q (P or Q)
4. Implication (→): True unless the first operand is true and the second is false.
o Example: P → Q (If P, then Q)
5. Biconditional (↔): True if both operands are either true or false.
o Example: P ↔ Q (P if and only if Q)
Formal Grammar of Propositional Logic:
A formal grammar can be defined as follows:
 Atoms (Propositions): Any lowercase letter (e.g., P, Q, R) represents a basic
proposition.
 Syntax Rules:
1. Atomic Formula: Any proposition P is an atomic formula.
2. Negation: If φ is a formula, then ¬φ is also a formula.
3. Conjunction: If φ and ψ are formulas, then φ ∧ ψ is a formula.
4. Disjunction: If φ and ψ are formulas, then φ ∨ ψ is a formula.
5. Implication: If φ and ψ are formulas, then φ → ψ is a formula.
6. Biconditional: If φ and ψ are formulas, then φ ↔ ψ is a formula.
This grammar allows for the construction of complex logical sentences.

3. Completeness of Proof of Resolution


In logic, resolution is a rule of inference used in automated theorem proving,
particularly in propositional and first-order logic. It is based on refutation: attempting to
prove that a negated proposition leads to a contradiction.
Completeness of Resolution means that if a set of clauses is unsatisfiable (i.e., a
contradiction exists), resolution will eventually derive an empty clause, proving the
unsatisfiability. This makes the resolution method complete for propositional logic, as it
guarantees that if a contradiction exists, it will be found through the application of the
resolution rule.

4. Exact Inference Method


Exact inference refers to drawing conclusions from knowledge based on a logical system
without approximations. Methods for exact inference in probabilistic or logic-based
systems include:
1. Deductive Inference: Using formal rules to derive conclusions. In propositional
logic, it involves applying inference rules (such as Modus Ponens or resolution) to derive
conclusions from premises.
2. Bayesian Inference: In probabilistic reasoning, Bayesian inference calculates the
posterior probability of a hypothesis given prior knowledge and observed evidence using
Bayes' theorem.
3. Forward and Backward Chaining: In rule-based systems, forward chaining starts
from known facts and applies rules to infer new facts, while backward chaining works
backward from a goal to find supporting facts.
These methods provide exact, rigorous results based on the available information.

5. Representing Knowledge in an Uncertain Domain


In uncertain domains, knowledge representation must account for uncertainty and
incomplete information. Common approaches include:
1. Probabilistic Logic: Uses probability theory to model uncertainty. For example,
Bayesian networks represent a set of variables and their conditional dependencies using
directed acyclic graphs (DAGs), where nodes represent variables, and edges represent
probabilistic dependencies.
2. Fuzzy Logic: Allows for degrees of truth rather than a strict true/false evaluation.
Fuzzy sets model uncertainty in terms of membership degrees (e.g., a temperature value
might be "somewhat hot").
3. Markov Decision Processes (MDPs): Used to represent decision-making problems in
uncertain environments, where the outcome of actions is probabilistic, and decisions
must be made under uncertainty.
4. Non-monotonic Logic: Deals with reasoning where adding new information may
invalidate previous conclusions. This is important in dynamic or uncertain environments,
where new evidence can change the outcome of a decision.

You might also like