0% found this document useful (0 votes)

12 views

AIML

Esay to learn

Uploaded by

Jemima A

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

AIML

Esay to learn

Uploaded by

Jemima A

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 159

MAILAM ENGINEERING COLLEGE

Mailam (Po), Tindivanam (Tk.), Villupuram (Dt.) Pin: 604 304

Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai and
Accredited by National Board of Accreditation (NBA) & TATA Consultancy Services (TCS)

CS3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

[REGULATION-2021]

STUDY MATERIAL

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

NAME OF THE STUDENT:…………………………………………………………………….

YEAR / SEM:………………………………………………………………………………………...

ACADEMIC YEAR:………………………………………………………………………………

PREPARED BY
1. Mr.R.MOHAN, AP/CSE
2. Mrs.G.SELVAPRIYA, AP/CSE
3. Mr.S.SHANMUGANATHAN, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

MAILAM ENGINEERING COLLEGE

Mailam – 604 304
(Approved by AICTE-New Delhi, Affiliated to Anna University-Chennai,
Accredited by National Board of Accreditation-New Delhi & TCS-Chennai)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
II YEAR / III SEM
CS3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
UNIT 1 – PROBLEM SOLVING

SYLLABUS:
Introduction to AI - AI Applications - Problem solving agents – search
algorithms – uninformed search strategies – Heuristic search
strategies – Local search and optimization problems – adversarial
search – constraint satisfaction problems (CSP)

PART A
1. What is Artificial Intelligence? Or Define Artificial Intelligence.
 Artificial intelligence is a wide-ranging branch of computer science
concerned with building smart machines capable of performing tasks
that typically require human intelligence.
 Artificial Intelligence is the process of building intelligent machines from
vast volumes of data.
 Systems learn from past learning and experiences and perform human-
like tasks.
 It enhances the speed, precision, and effectiveness of human efforts.

2. How Artificial Intelligence is Defined? List the Types Of

Approaches for Artificial Intelligence.
 Thinking humanly: mimicking thought based on the human mind.
 Thinking rationally: mimicking thought based on logical reasoning.
 Acting humanly: acting in a manner that mimics human behavior.
 Acting rationally: acting in a manner that is meant to achieve a
particular goal.

3. What Are The Four Types Of Artificial Intelligence?

 Reactive machines: able to perceive and react to the world in front of it as
it performs limited tasks.
 Limited memory: able to store past data and predictions to inform
predictions of what may come next.
 Theory of mind: able to make decisions based on its perceptions of how
others feel and make decisions.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 1

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 Self-awareness: able to operate with human-level consciousness and

understand its own existence.

4. List the types of Artificial Intelligence-based on capabilities.

 Narrow AI
 General AI
 Super AI

5. List the types of Artificial Intelligence-based on functionalities -

 Purely Reactive or Reactive Machines
 Limited Memory
 Theory of Mind
 Self Awareness

6. Tabulate the comparison of Artificial Intelligence-based on

functionalities.
Reactive Machines Limited Memory Theory of Mind Self-Awareness
Simple classification Human-level intelligence
Complex classification Understands human
and pattern that can by-pass human
tasks reasoning and motives
recognition tasks intelligence too
Needs fewer examples
Great when all Uses historical data Sense of self-
to learn because it
parameters are known to make predictions consciousness
understands motives

Can’t deal with Next milestone for the

Current state of AI Does not exist yet
imperfect information evolution of AI

7. List the Four possible goals to pursue in artificial intelligence.

• Systems that think like humans.
• Systems that think rationally.
• Systems that act like humans
• Systems that act rationally

8. Point out the subfields of Artificial Intelligence.

 Machine Learning
 Deep Learning
 Neural Networks
 Cognitive Computing
 Computer Vision
 Natural Language Processing

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 2

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

9. What are the advantages of Artificial Intelligence?

 Reduced human error
 Risk avoidance
 Replacing repetitive jobs
 Digital assistance

10. List the limitations of Artificial Intelligence

 High cost of creation.
 No emotions.
 Box thinking.
 Can’t think for itself.

11. Differentiate Deep learning vs. machine learning

o Machine Learning
 Machine Learning is the learning in which a machine can learn on its
own from examples and previous experiences.
 It is machine learning that gives AI the ability to learn.
 This is done by using algorithms to discover patterns and generate
insights from the data they are exposed to.
o Deep Learning
 Deep learning, is a subcategory of machine learning, provides AI with
the ability to mimic a human brain’s neural network.
 It can make sense of patterns, noise, and sources of confusion in the
data.
12. Compare Data Science, Artificial Intelligence and Machine Learning.

Data Science Artificial Intelligence Machine Learning

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 3

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

AI includes predictive
Data Science includes data
modeling to predict events ML is a subset of Artificial
operations based on user
based on the previous and Intelligence.
requirements.
current data.
It is mainly used in fraud Applications of AI include Online recommendations, facial
detection, healthcare, BI chat bots, voice assistants, recognition, and NLP are a few
analysis, and more. and weather prediction. examples of ML.

13. List the various Applications of Artificial Intelligence or AI

Applications.
 AI Application in E-Commerce
 Personalized Shopping
 AI-powered Assistants
 Fraud Prevention
 AI Application in Education
 Administrative Tasks Automated to Aid Educators
 Creating Smart Content
 Voice Assistants
 Personalized Learning
 AI Application in Lifestyle
 Autonomous Vehicles
 Spam Filters
 Facial Recognition
 Recommendation System
 AI Application in Navigation
 AI Application in Robotics
 Carrying goods in hospitals, factories, and warehouses
 Cleaning offices and large equipment
 Inventory management
 AI Application in Human Resource
 AI Application in Healthcare
 AI Application in Agriculture
 AI Application in Gaming
 AI Application in Automobiles
 AI Application in Social Media
 Instagram
 Facebook
 Twitter
 AI Application in Marketing
 AI Application in Chatbots

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 4

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

14. Define an Agent and list its types.

Agent
 An agent is anything that can be viewed as perceiving its environment
through sensors and acting upon that environment through actuators.

Types of Agent
 Simple reflex agents respond directly to percepts.
 Model-based reflex agents maintain internal state to track aspects of
the world that are not evident in the current percept.
 Goal-based agents act to achieve their goals, and
 Utility-based agents try to maximize their own expected “happiness.”

15. Define an environment and task environment.

Environments
 The environment could be everything—the entire universe.
Task Environment
 A task environment specification includes the performance measure,
the external environment, the actuators, and the sensors.

16. List the steps in designing an agent

 In designing an agent, the first step must always be to specify the task
environment as fully as possible.
 The agent program implements the agent function.
 There exists a variety of basic agent program designs reflecting the
kind of information made explicit and used in the decision process.
 The designs vary in efficiency, compactness, and flexibility.
 The appropriate design of the agent program depends on the nature of
the environment.
 All agents can improve their performance through learning.

17. Define Problem Solving Agent. List the steps involved in Problem
Solving Agent.
 Problem Solving Agent
 Problem-solving agent is a goal-based agent that focuses on goals
using a group of algorithms and techniques to solve a well-defined
problem.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 5

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

An agent may need to plan a sequence of actions that form a path
to a goal state.
 Such an agent is called a problem-solving agent, and the
computational process it undertakes is called search.
 Steps performed by Problem-solving agent
 Goal Formulation
 Problem Formulation
 Search
 Solution
 Execution

18. List the Components in problem formulation.

a) Initial State: It is the starting state or initial step of the agent towards
its goal.
b) Actions: It is the description of the possible actions available to the
agent.
c) Transition Model: It describes what each action does.
d) Goal Test: It determines if the given state is a goal state.
e) Path cost: It assigns a numeric cost to each path that follows the
goal.

19. Define search algorithms and list the types of search algorithm.
 A search algorithm takes a search problem as input and returns a
solution.
 Types of Search Algorithm
 Uninformed Search Algorithms
o Depth First Search
o Depth Limited Search
o Breadth First Search
o Iterative Deepening Search
o Uniform Cost Search
o Bidirectional Search
 Informed (Heuristic) Search Strategies
o Greedy Search
o A* Tree Search

20. Define Depth First Search. List the advantages and disadvantages
of DFS.
• Depth-first search (DFS) is an algorithm for traversing or searching
tree or graph data structures.
• The algorithm starts at the root node and explores as far as possible
along each branch before backtracking.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 6

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

• It uses last in-first-out strategy and hence it is implemented using a

stack.
Advantage of Depth-first Search:
 DFS uses extremely little memory.
 It takes less time to reach the goal node than the BFS method.
Disadvantage of Depth-first Search:
 There's a chance that many states will recur, and there's no certainty
that a solution will be found.
 The DFS algorithm performs deep searching and may occasionally
enter an infinite cycle.

21. Which solution would DFS find to move from node S to node G if run
on the graph below?

Solution.

Path: S - > A -> B -> C -> G

22. Define Depth Limited Search. List its advantages and disadvantages.
 A depth-limited search algorithm is similar to depth-first search with a
predetermined limit.
 Depth-limited search can solve the drawback of the infinite path in the
Depth-first search.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 7

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Advantages:
 Depth-limited search is Memory efficient.
Disadvantages:
 Depth-limited search also has a disadvantage of incompleteness.
 It may not be optimal if the problem has more than one solution.

23.Define Breadth-first search (BFS). List its advantages and disadvantages.

• Breadth-first search (BFS) is an algorithm for traversing or searching
tree or graph data structures.
• It starts at the tree root and explores all of the neighbor nodes at the
present depth prior to moving on to the nodes at the next depth level.
Advantages of Breadth-first Search:
 If a solution is available, BFS will provide it.
 If there are multiple answers to a problem, BFS will present the simplest
solution with the fewest steps.
Disadvantages of Breadth-first Search:
 It necessitates a large amount of memory.
 If the solution is located far from the root node, BFS will take a long time.

24. Which solution would BFS find to move from node S to node G if run on
the graph below?

Solution.

Path: S -> D -> G

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 8

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

25. Define iterative deepening depth-first search/ iterative deepening

search
 This search is a combination of BFS and DFS, as BFS guarantees to
reach the goal node and DFS occupies less memory space.
 Therefore, iterative deepening search combines these two advantages of
BFS and DFS to reach the goal node.
 It gradually increases the depth-limit from 0,1,2 and so on and reach
the goal node.

26. Define Uniform Cost Search. List its advantages and disadvantages.
 A searching algorithm for traversing a weighted tree or graph is uniform-
cost search.
 When a separate cost is provided for each edge, this algorithm is used.
 The uniform-cost search's main purpose is to discover the shortest path
to the goal node with the lowest cumulative cost.
 Cost of a node is defined as:
cost(node) = cumulative cost of all nodes from root
cost(root) = 0
Advantages of Uniform-cost Search:
 The path with the lowest cost is chosen at each state.
 UCS is complete only if states are finite and there should be no loop
with zero weight.
 UCS is optimal only if there is no negative cost.
Disadvantages of Uniform-cost Search:
 It is just concerned with the expense of the path.

27. Which solution would UCS find to move from node S to node G if run on
the graph below?

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 9

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Solution.

Path: S -> A -> B -> G

Cost: 5

28. Define Bidirectional Search.

29. Define informed (heuristic) search strategies and mention its types.
 Here, the algorithms have information on the goal state, which helps
in more efficient searching.
 This information is obtained by something called a heuristic.
Search Heuristics:
 In an informed search, a heuristic is a function h(n) estimates how
close a state is to the goal state.
 h(n) = estimated cost of the cheapest path from the state at node n to
a goal state.

Types of Informed search algorithms.

 Greedy Search
 A* Tree Search
 A* Graph Search

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 10

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

30. Find the path from S to G using greedy search. The heuristic values h
of each node below the name of the node.

Solution.
Starting from S, we can traverse to A(h=9) or D(h=5). We choose D,
as it has the lower heuristic cost. Now from D, we can move to
B(h=4) or E(h=3). We choose E with a lower heuristic cost. Finally,
from E, we go to G(h=0). This entire traversal is shown in the search
tree below, in blue.

Path: S -> D -> E -> G

31. Define A* tree search.

 A* Tree Search, or simply known as A* Search, combines the
strengths of uniform-cost search and greedy search.
 In this search, the heuristic is the summation of the cost in UCS,
denoted by g(x), and the cost in the greedy search, denoted by h(x).
The summed cost is denoted by f(x).

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 11

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

32. Find the path to reach from S to G using A* search.

Solution.
 Starting from S, the algorithm computes g(x) + h(x) for all nodes in
the fringe at each step, choosing the node with the lowest sum.
 The entire work is shown in the table below.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 12

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

33. Use graph searches to find paths from S to G in the following

graph.

Solution

Path: S -> D -> B -> E -> G

Cost: 7

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 13

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

34. Define Local Search Algorithm. List its types.

Local Search Algorithm - Definition
 Local search algorithms operate by searching from a start state to
neighboring states, without keeping track of the paths, or the set of states
that have been reached.
 “Local search algorithms” where the path cost does not matters, and
only focus on solution-state needed to reach the goal node.
 It has two key advantages:
o use very little memory;
o Often find reasonable solutions in large or infinite state spaces for
which systematic algorithms are unsuitable.
Different types of local searches:
 Hill-climbing Search
Types of Hill climbing search algorithm
 Simple hill climbing
 Steepest-ascent hill climbing
 Stochastic hill climbing
 Random-restart hill climbing
 Simulated Annealing
 Local Beam Search

35. What are the Limitations of Hill climbing algorithm.

 Local Maxima: It is that peak of the mountain which is highest than all
its neighboring states but lower than the global maxima. It is not the goal
peak because there is another peak higher than it.
 Plateau: It is a flat surface area where no uphill exists. It becomes
difficult for the climber to decide that in which direction he should move
to reach the goal point.
 Ridges: It is a challenging problem where the person finds two or more
local maxima of the same height commonly. It becomes difficult for the
person to navigate the right point and stuck to that point itself.

36. Define Adversarial Search.

 Adversarial search is a game-playing technique where the agents are
surrounded by a competitive environment.
 A conflicting goal is given to the agents (multiagent).
 These agents compete with one another and try to defeat one another in
order to win the game.

 Such conflicting goals give rise to the adversarial search.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 14

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

37. What are the Elements of Game Playing search?

 S0: It is the initial state from where a game begins.
 PLAYER (s): It defines which player is having the current turn to make a
move in the state.
 ACTIONS (s): It defines the set of legal moves to be used in a state.
 RESULT (s, a): It is a transition model which defines the result of a move.
 TERMINAL-TEST (s): It defines that the game has ended and returns
true.
 UTILITY (s,p): It defines the final value with which the game has ended.
This function is also known as Objective function or Payoff function.
 The price which the winner will get i.e.
o (-1): If the PLAYER loses.
o (+1): If the PLAYER wins.
o (0): If there is a draw between the PLAYERS.

38 List the types of algorithms in adversarial search.

 Minimax Algorithm
 Alpha-beta Pruning

39. Define a Game Tree.

 A game tree is where the nodes represent the states of the game and
edges represent the moves made by the players in the game.
 Players will be two namely:
o MIN: Decrease the chances to win the game.
o MAX: Increases his chances of winning the game.
 MAX will choose that path which will increase its utility value and MIN
will choose the opposite path which could help it to minimize MAX’s
utility value.

40. Define Constraint satisfaction problems (CSP) – Definition.

 A constraint satisfaction problem (CSP) is a problem that requires its
solution within some limitations or conditions also known as constraints.
 It consists of the following:
 A finite set of variables which stores the solution (V = {V1, V2, V3,.....,
Vn})
 A set of discrete values known as domain from which the solution is
picked (D = {D1, D2, D3,.....,Dn})
 A finite set of constraints (C = {C1, C2, C3,......, Cn})

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 15

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

41. List the CSP Algorithms and Problems.

1. CryptArithmetic (Coding alphabets to numbers.)
2. n-Queen (In an n-queen problem, n queens should be placed in an
nXn matrix such that no queen shares the same row, column or
diagonal.)
3. Map Coloring (coloring different regions of map, ensuring no adjacent
regions have the same color)
4. Crossword (everyday puzzles appearing in newspapers)
5. Sudoku (a number grid)
6. Latin Square Problem

42. List the Types of Domains in CSP.

 Discrete Domain: It is an infinite domain which can have one state
for multiple variables. For example, a start state can be allocated
infinite times for each variable.
 Finite Domain: It is a finite domain which can have continuous
states describing one domain for one specific variable. It is also called
a continuous domain.

43. List the Constraint Types in CSP.

 Unary Constraints: It is the simplest type of constraints that restricts
the value of a single variable.
 Binary Constraints: It is the constraint type which relates two
variables. A value x2 will contain a value which lies
between x1 and x3.
 Global Constraints: It is the constraint type which involves an
arbitrary number of variables.

44. Define Map Coloring / Graph Coloring Problem.

 It is a map of Australia; it consists of states and territories as in
Figure 1.19.
 The task will be coloring each region with the colors either red, green
or blue.
 In such case, no neighboring regions will have the same color.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 16

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

PART B
1. Define Artificial Intelligence and its types, goals, advantages and limitations
in detail.

ARTIFICIAL INTELLIGENCE:
1.1 Artificial Intelligence Definition
1.2 Types of Artificial Intelligence
1.2.1 Types of Artificial Intelligence-based on capabilities
1.2.2 Types of Artificial Intelligence-based on functionalities
1.3 Four possible goals to pursue in artificial intelligence:
1.3.1 Acting humanly: The Turing test approach
1.3.2 Thinking humanly: The cognitive modeling approach
1.3.3 Thinking rationally: The “laws of thought” approach
1.3.4 Acting rationally: The rational agent approach
1.4 Subfields of Artificial Intelligence
1.5 Advantages of Artificial Intelligence
1.6 Limitations of Artificial Intelligence

1.1 Artificial Intelligence Definition

 Artificial Intelligence is the process of building intelligent machines
from vast volumes of data.
 Systems learn from past learning and experiences and perform
human-like tasks.
 It enhances the speed, precision, and effectiveness of human efforts.
 AI uses complex algorithms and methods to build machines that can
make decisions on their own as shown in Figure 1.1.
 Machine Learning and Deep learning forms the core of Artificial
Intelligence.

Figure 1.1 – Artificial Intelligence

1.2 Types of Artificial Intelligence

 Artificial Intelligence can be divided based on capabilities and
functionalities as in figure 1.2.
 Types of Artificial Intelligence-based on capabilities -
o Narrow AI
o General AI
o Super AI

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 17

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 Types of Artificial Intelligence-based on functionalities -

o Reactive Machines
o Limited Memory
o Theory of Mind
o Self Awareness

Figure 1.2 – Types of Artificial Intelligence

1.2.1 Types of Artificial Intelligence-based on capabilities -

o Narrow AI
o General AI
o Super AI
 Narrow AI:
o Referred as Weak AI, focuses on one automate specific task and
cannot perform beyond its limitations.
o Example: Alexa and other smart assistants, Self-driving cars,
Google search, Email spam filters, Netflix's recommendations,
 Artificial General Intelligence (AGI):
o Referred as “strong AI,” can understand and learn any intellectual
task that a human being can.
o Example: Fujitsu has built the K computer, which is one of the
fastest supercomputers in the world
 Super Intelligence:
o Super AI surpasses human intelligence and can perform any task
better than a human.
o Some of the critical characteristics of super AI include thinking,
solving puzzles, making judgments, and decisions on its own.

1.2.2 Types of Artificial Intelligence-based on functionalities -

o Purely Reactive or Reactive Machines
o Limited Memory
o Theory of Mind
o Self Awareness

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 18

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 Purely Reactive or Reactive Machines

o These machines do not have any memory or data to work with,
specializing in just one field of work.
o For example, in a chess game, the machine observes the moves
and makes the best possible decision to win.
 Limited Memory
o These machines collect previous data and continue adding it to
their memory.
o They have enough memory or experience to make proper decisions,
but memory is minimal.
o For example, this machine can suggest a restaurant based on the
location data that has been gathered.
 Theory of Mind
o This kind of AI can understand thoughts and emotions, as well as
interact socially. However, a machine based on this type is yet to
be built.
 Self-Aware
o Self-aware machines are the future generation of the new
technologies. It will be intelligent, sentient, and conscious.

Table 1.1 - Comparison of types of Artificial Intelligence-based on functionalities:

Reactive Machines Limited Memory Theory of Mind Self-Awareness
Simple classification Human-level intelligence
Complex classification Understands human
and pattern that can by-pass human
tasks reasoning and motives
recognition tasks intelligence too
Needs fewer examples
Great when all Uses historical data Sense of self-
to learn because it
parameters are known to make predictions consciousness
understands motives

Can’t deal with Next milestone for the

Current state of AI Does not exist yet
imperfect information evolution of AI

1.3 Four possible goals to pursue in artificial intelligence:

 Systems that think like humans.
 Systems that think rationally.
 Systems that act like humans
 Systems that act rationally
- Refer Figure 1.3

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 19

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Four Approaches from two dimensions

Figure 1.3 - Four Approaches from two dimensions

1.3.1 Acting humanly: The Turing test approach

o The Turing test, proposed by Alan Turing (1950), was designed as
a method for determining if a machine is intelligent or not.
o The computer would need the following capabilities:
 natural language processing to communicate successfully in a
human language;
 knowledge representation to store what it knows or hears;
 automated reasoning to answer questions and to draw new
conclusions;
 machine learning to adapt to new circumstances and to detect
and extrapolate patterns.
o However, other researchers have proposed a total Turing test.
o To pass the total Turing test, a robot will need
 computer vision and speech recognition to perceive the world;
 robotics to manipulate objects and move about.

1.3.2 Thinking humanly: The cognitive modeling approach

o Thinking humanly is to make a system or program to think like a
human.
o Can learnabout human thought in three ways:
 Introspection—trying to catch own thoughts as they go by;
 Psychological experiments—observing a person in action;
 Brain imaging—observing the brain in action.

Cognitive Science
o Cognitive science is the interdisciplinary study of mind and
intelligence, embracing philosophy, psychology, artificial
intelligence, neuroscience, linguistics, and anthropology.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 20

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

1.3.3 Thinking rationally: The “laws of thought” approach

o The Greek philosopher Aristotle was one of the first to attempt to
codify “right thinking”—that is, irrefutable reasoning processes.
o His syllogisms provided patterns for argument structures that
always yielded correct conclusions when given correct premises.
o These laws of thought were supposed to govern the operation of
the mind; the study initiated the field called logic.
o By 1965, programs could solve any solvable problem described in
logical notation. The so-called logicist tradition.
o The theory of probability allows rigorous reasoning with uncertain
information.
o In principle, it allows the construction of a comprehensive model of
rational thought, leading from raw perceptual information to an
understanding of how the world works to predictions about the
future.
1.3.4 Acting rationally: The rational agent approach
o An agent is just something that acts to do.
o Computer agents are expected to do more: operate autonomously,
perceive their environment, persist over a prolonged time period,
adapt to change, and create and pursue goals.
o A rational agent is one that acts so as to achieve the best outcome
or, when there is uncertainty, the best expected outcome.

1.4 Subfields of Artificial Intelligence

o Machine Learning
 Machine Learning is the learning in which a machine can learn on its
own from examples and previous experiences.
 It is machine learning that gives AI the ability to learn.
 This is done by using algorithms to discover patterns and generate
insights from the data they are exposed to.
Deep Learning
 Deep learning, is a subcategory of machine learning, provides AI with
the ability to mimic a human brain’s neural network.
 It can make sense of patterns, noise, and sources of confusion in the
data.
o Neural Networks
 Artificial Neural Networks (ANNs) are one of the most important tools
in Machine Learning to find patterns within the data, which are far too
complex for a human to figure out and teach the machine to
recognize.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 21

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

o Cognitive Computing:
 The ultimate goal of cognitive computing is to imitate the human
thought process in a computer model.
 Using self-learning algorithms, pattern recognition by neural
networks, and natural language processing, a computer can mimic
the human way of thinking.
 Here, computerized models are deployed to simulate the human
cognition process.
o Computer Vision:
 Computer vision works by allowing computers to see, recognize, and
process images, the same way as human vision does, and then it
provides an appropriate output.
 The computer must understand what it sees, and then analyze it,
accordingly.
o Natural Language Processing:
 Natural language processing means developing methods that help us
communicate with machines using natural human languages like
English.

1.5 Advantages of Artificial Intelligence

o Reduced human error:
• With humans involved in the tasks where precision is required, there
will always be a chance of error. However, if programmed properly,
machines do not make mistakes and easily perform repetitive tasks
without making many errors, if not at all.
o Risk avoidance:
• Replacing humans with intelligent robots is one of the biggest
advantages of Artificial Intelligence.
o Replacing repetitive jobs:
• Our day-to-day work includes many repetitive tasks that we have to do
every day without any change. Now, machines have replaced these
tasks so that humans can spend this time doing creative things.
o Digital assistance:
• With digital assistants to interact with users 24/7, organizations can
save the need for human resources and deliver faster service to
customers.
1.6 Limitations of Artificial Intelligence
o High cost of creation.
o No emotions.
o Box thinking.
o Can’t think for Itself.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 22

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

2. Compare and differentiate between Data Science vs Artificial Intelligence

Vs. Machine Learning.

DATA SCIENCE Vs. ARTIFICIAL INTELLIGENCE

2.1 Deep learning vs. machine learning
2.2 Data Science vs. Artificial Intelligence vs. Machine Learning

2.1 Deep learning vs. machine learning

Figure 1.4 - Deep learning vs. machine learning

o “Artificial intelligence is a set of algorithms and intelligence to try to

mimic human intelligence.
o Machine learning is one of them, and deep learning is one of those
machine learning techniques as shown in figure 1.4.”
o Machine Learning
 Machine Learning is the learning in which a machine can learn on
its own from examples and previous experiences.
 It is machine learning that gives AI the ability to learn.
 This is done by using algorithms to discover patterns and generate
insights from the data they are exposed to.
o Deep Learning
 Deep learning, is a subcategory of machine learning, provides AI
with the ability to mimic a human brain’s neural network.
 It can make sense of patterns, noise, and sources of confusion in
the data.

2.2 Data Science vs. Artificial Intelligence vs. Machine Learning

 Data Science, Machine Learning, and Artificial Intelligence are
interconnected, but each one of them uniquely serves a different
purpose as mentioned in table 1.2

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 23

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Table 1.2 - Comparison of Data Science, Artificial Intelligence and Machine Learning:

Data Science Artificial Intelligence Machine Learning

Machine Learning is a part of
Data Science is used for data AI combines iterative AI where mathematical models
sourcing, cleaning, processing, processing and intelligent are used to empower a
and visualizing for analytical algorithms to imitate the machine to learn with or
purposes. human brain’s functions. without being programmed
regularly.
AI uses decision trees and
Data Science deals with both Machine Learning utilizes
logic theories to find the
structured and unstructured statistical models and neural
best possible solution to
data for analytics. networks to train a machine.
the given problem.
As a subset of AI, Machine
Some of the popular tools in Some of the popular
Learning also use the same
Data Science are Tableau, SAS2, libraries include Keras,
libraries, along with tools such
Apache, MATLAB, Spark, and Scikit-Learn, and
as Amazon Lex2, IBM Watson,
more. TensorFlow.
and Azure ML Studio.
AI includes predictive
Data Science includes data
modeling to predict events ML is a subset of Artificial
operations based on user
based on the previous and Intelligence.
requirements.
current data.
It is mainly used in fraud Applications of AI include Online recommendations, facial
detection, healthcare, BI chat bots, voice assistants, recognition, and NLP are a few
analysis, and more. and weather prediction. examples of ML.

3. Explain the history of artificial intelligence.

 The important events and milestones in the evolution of artificial intelligence
include the following:
1943:
 The first work that is now generally recognized as AI was done by
Warren McCulloch and Walter Pitts (1943).
 Inspired by the mathematical modeling work of Pitts’s advisor Nicolas
they drew on three sources:
 knowledge of the basic physiology and function of neurons in the
brain;
 a formal analysis of propositional logic;
 Turing’s theory of computation.
1949:
 Donald Hebb (1949) demonstrated a simple updating rule for
modifying the connection strengths between neurons.
 His rule, now called Hebbian learning, remains an influential model to
this day.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 24

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

1950:
 Alan Turing publishes Computing Machinery and Intelligence. In the
paper, Turing—proposes to answer the question 'can machines
think?' and introduces the Turing Test to determine if a computer can
demonstrate the same intelligence as a human. The value of the
Turing test has been debated ever since.
1956:
 John McCarthy coins the term 'artificial intelligence' at the first-ever
AI conference at Dartmouth College. Later that year, Allen Newell,
J.C. Shaw, and Herbert Simon create the Logic Theorist, the first-ever
running AI software program.
1958:
 In 1958, John McCarthy had defined the high-level language Lisp,
which was to become the dominant AI programming language for the
next 30 years.
1959:
 At IBM, Nathaniel Rochester and his colleagues produced some of the
first AI programs.
 Herbert Gelernter (1959) constructed the Geometry Theorem Prover,
which was able to prove theorems that many students of mathematics
would find quite tricky.
1967:
 Frank Rosenblatt builds the Mark 1 Perceptron, the first computer
based on a neural network that 'learned' though trial and error.
1980s:
 Neural networks which use a back propagation algorithm to train
itself become widely used in AI applications.
1997:
 IBM's Deep Blue beats then world chess champion Garry Kasparov, in
a chess match (and rematch).
2011:
 IBM Watson beats champions Ken Jennings and Brad Rutter
at Jeopardy!
2015:
 Baidu'sMinwa supercomputer uses a special kind of deep neural
network called a convolution neural network to identify and categorize
images with a higher rate of accuracy than the average human.
2016:
 DeepMind'sAlphaGo program, powered by a deep neural network,
beats Lee Sodol, the world champion Go player, in a five-game match.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 25

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 The first “robot citizen,” a humanoid robot named Sophia, is created

by Hanson Robotics and is capable of facial recognition, verbal
communication and facial expression.
2018:
 Google releases natural language processing engine BERT, reducing
barriers in translation and understanding by ML applications.
2020:
 Baidu releases its Linear Fold AI algorithm to scientific and medical
teams working to develop a vaccine during the early stages of the
SARS-CoV-2 pandemic.
 OpenAI releases natural language processing model GPT-3, which is
able to produce text modeled after the way people speak and write.
2022:
 The National Institute of Standards and Technology releases the first
draft of its AI Risk Management Framework, voluntary U.S. guidance
“to better manage risks to individuals, organizations, and society
associated with artificial intelligence.”
 DeepMind unveils Gato, an AI system trained to perform hundreds of
tasks, including playing Atari, captioning images and using a robotic
arm to stack blocks.

4. List and explain the various Applications of Artificial Intelligence or AI

Applications.

APPLICATIONS OF AI:
4.1. AI Application in E-Commerce
4.2. AI Application in Education
4.3. AI Application in Lifestyle
4.4. AI Application in Navigation
4.5. AI Application in Robotics
4.6. AI Application in Human Resource
4.7. AI Application in Healthcare
4.8. AI Application in Agriculture
4.9. AI Application in Gaming
4.10. AI Application in Automobiles
4.11. AI Application in Social Media
4.12. AI Application in Marketing
4.13. AI Application in Chatbots
4.14. AI Application in Finance

4.1. AI Application in E-Commerce

 Personalized Shopping
o Artificial Intelligence technology is used to create
recommendation engines.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 26

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

o These recommendations are made in accordance with their

browsing history, preference, and interests.
o It helps in improving the relationship with the customers and the
loyalty towards the brand.
 AI-powered Assistants
o Virtual shopping assistants and chatbots help improve the user
experience while shopping online.
o Natural Language Processing is used to make the conversation
sound as human and personal as possible.
 Fraud Prevention
o Credit card frauds and fake reviews are two of the most
significant issues that E-Commerce companies deal with.
o Many customers prefer to buy a product or service based on
customer reviews.
o AI can help identify and handle fake reviews.

4.2.AI Application in Education

 Administrative Tasks Automated to Aid Educators
o Artificial Intelligence can help educators with non-educational
tasks like task-related duties like facilitating and automating
personalized messages to students, back-office tasks like grading
paperwork, arranging and facilitating parent and guardian
interactions, routine issue feedback facilitating, managing
enrollment, courses, and HR-related topics.
 Creating Smart Content
o Digitization of content like video lectures, conferences, and text
book guides can be made using Artificial Intelligence.
o Artificial Intelligence helps create a rich learning experience by
generating and providing audio and video summaries and integral
lesson plans.
 Voice Assistants
o Without even the direct involvement of the lecturer or the teacher, a
student can access extra learning material or assistance through
Voice Assistants.
 Personalized Learning
o Using top AI technologies, hyper-personalization techniques can be
used to monitor students’ data thoroughly, and habits, lesson
plans, reminders, study guides, flash notes, frequency or revision,
etc., can be easily generated.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 27

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

4.3. AI Application in Lifestyle

 Autonomous Vehicles
o Automobile manufacturing companies like Toyota, Audi, Volvo,
and Tesla use machine learning to train computers to think and
evolve like humans when it comes to driving in any environment
and object detection to avoid accidents.
 Spam Filters
o The email has AI that filters out spam emails sending them to
spam or trash folders.
o The popular email provider, Gmail, has managed to reach a
filtration capacity of approximately 99.9%.
 Facial Recognition
o Devices like our phones, laptops, and PCs use facial recognition
techniques by using face filters to detect and identify in order to
provide secure access.
o Facial recognition is a widely used Artificial Intelligence application
even in high security-related areas in several industries.
 Recommendation System
o Various platforms in daily lives like e-commerce, entertainment
websites, social media, video sharing platforms, like youtube, etc.,
all use the recommendation system to get user data and provide
customized recommendations to users to increase engagement.

4.4. AI Application in Navigation

 The technology uses a combination of Convolution Neural
Network and Graph Neural Network, which makes lives easier for
users by automatically detecting the number of lanes and road types
behind obstructions on the roads.
 AI is heavily used by Uber and many logistics companies to improve
operational efficiency, analyze road traffic, and optimize routes.

4.5. AI Application in Robotics

 Robotics is another field where artificial intelligence applications are
commonly used.
 Robots powered by AI use real-time updates to sense obstacles in its
path and pre-plan its journey instantly.
 It can be used for -
 Carrying goods in hospitals, factories, and warehouses
 Cleaning offices and large equipment
 Inventory management

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 28

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

4.6. AI Application in Human Resource

 Artificial Intelligence helps with blind hiring.
 AI drive systems can scan job candidates' profiles, and resumes to
provide recruiters an understanding of the talent pool they must
choose from.

4.7. AI Application in Healthcare

 AI applications are used in healthcare to build sophisticated
machines that can detect diseases and identify cancer cells.
 Artificial Intelligence can help analyze chronic conditions with lab
and other medical data to ensure early diagnosis.
 AI uses the combination of historical data and medical intelligence
for the discovery of new drugs.

4.8. AI Application in Agriculture

 Artificial Intelligence is used to identify defects and nutrient
deficiencies in the soil.
 This is done using computer vision, robotics, and machine learning
applications, AI can analyze where weeds are growing.
 AI bots can help to harvest crops at a higher volume and faster pace
than human laborers.

4.9. AI Application in Gaming

 AI can be used to create smart, human-like NPCs to interact with
the players.
 It can also be used to predict human behavior using which game
design and testing can be improved.
 The Alien Isolation games released in 2014 uses AI to stalk the
player throughout the game. The game uses two Artificial
Intelligence systems - ‘Director AI’ that frequently knows your
location and the ‘Alien AI,’ driven by sensors and behaviors that
continuously hunt the player.

4.10. AI Application in Automobiles

 Artificial Intelligence is used to build self-driving vehicles.
 AI can be used along with the vehicle’s camera, radar, cloud
services, GPS, and control signals to operate the vehicle.
 AI can improve the in-vehicle experience and provide additional
systems like emergency braking, blind-spot monitoring, and driver-
assist steering.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 29

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

4.11. AI Application in Social Media

 Instagram
 Facebook
 Twitter

4.12. AI Application in Marketing

 Artificial intelligence (AI) applications are popular in the marketing
domain as well.
 AI can provide users with real-time personalization’s based on their
behavior and can be used to edit and optimize marketing campaigns
to fit a local market's needs.

4.13. AI Application in Chatbots

 AI chatbots can comprehend natural language and respond to
people online who use the "live chat" feature that many
organizations provide for customer service.
 As AI continues to improve, these chatbots can effectively resolve
customer issues, respond to simple inquiries, improve customer
service, and provide 24/7 support.

4.14. AI Application in Finance

 Whether it’s personal finance, corporate finance, or consumer
finance, the highly evolved technology that is offered through AI can
help to significantly improve a wide range of financial services.
 Artificial intelligence can also detect changes in transaction patterns
and other potential red flags that can signify fraud, which humans
can easily miss, and thus saving businesses and individuals from
significant loss.
 Aside from fraud detection and task automation, AI can also better
predict and assess loan risks.

5. Explain in detail about agents and its types in detail.

AGENTS
5.1 Agents and Environments
5.2 Basic Terminologies
5.3 Steps in designing an agent
5.4 Types of Agent

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 30

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

5.1Agents and Environments

Agent
 An agent is anything that can be viewed as perceiving its
environment through sensors and acting upon that environment
through actuators as in Figure 1.5.
 Example:
o A human agent has eyes, ears, and other organs for sensors
and hands, legs, vocaltract, and so on for actuators.
o A robotic agent might have cameras and infrared range
findersfor sensors and various motors for actuators.
o A software agent receives file contents, network packets, and
human input (keyboard/mouse/touch screen/voice) as sensory
inputsand acts on the environment by writing files, sending
network packets, and displaying information or generating
sounds.

Environments
 The environment could be everything—the entire universe.

Figure 1.5: Agents and Environments

5.2 Basic Terminologies

Agent Function
 The agent function for an agent specifies the action taken by the
agent in response to any percept sequence.
Performance Measure
 The performance measure evaluates the behavior of the agent in
an environment.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 31

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Rational Agent
 A rational agent acts so as to maximize the expected value of the
performance measure, given the percept sequence it has seen so
far.
Task Environment
 A task environment specification includes the performance
measure, the external environment, the actuators, and the
sensors.

5.3 Steps in designing an agent

5.4 Types of Agent

 Simple reflex agents respond directly to percepts. Refer fig 1.6
 Model-based reflex agents maintain internal state to track aspects of
the world that are not evident in the current percept. Refer fig 1.7
 Goal-based agents act to achieve their goals Refer fig 1.8
 Utility-based agents try to maximize their own expected “happiness.”
Refer fig 1.9

Simple reflex agents

Figure 1.6: Simple reflex agents

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 32

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Model-based reflex

Figure 1.7: Model-based reflex

Goal-based agents

Figure 1.8: Goal-based agents

Utility-based agents

Figure 1.9: Utility-based agents

Learning agents:

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 33

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

It allows the agent to operate in initially unknown environments and to become more
competent than its initial knowledge alone might allow. A learning agent can be divided into
four conceptual components, as shown in figure 1.10:

Figure 1.10: Learning agents

6. Explain in detail about problem solving agent with example in detail.

PROBLEM SOLVING AGENT

6.1 Problem Solving Agent Definition

6.2 Steps performed by Problem-solving agent
6.3 Types of problem approaches:
6.3.1 Toy Problem or Standardized Problem
6.3.2 Real-world Problem

6.1 Problem Solving Agent Definition

 Problem-solving agent is a goal-based agent that focus on goals using
a group of algorithms and techniques to solve a well-defined problem.
 An agent may need to plan a sequence of actions that form a path to a
goal state.
 Such an agent is called a problem-solving agent, and the
computational process it undertakes is called search.
6.2 Steps performed by Problem-solving agent
 Goal Formulation:
o It is the first and simplest step in problem-solving.
o It organizes the steps/sequence required to formulate one goal out
of multiple goals as well as actions to achieve that goal. Goal

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 34

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

formulation is based on the current situation and the agent's

performance measure
 Problem Formulation:
o It is the most important step of problem-solving which decides
what actions should be taken to achieve the formulated goal.
o Components in problem formulation:
1. Initial State: It is the starting state or initial step of the agent
towards its goal.
2. Actions: It is the description of the possible actions available to
the agent.
3. Transition Model: It describes what each action does.
4. Goal Test: It determines if the given state is a goal state.
5. Path cost: It assigns a numeric cost to each path that follows
the goal.
The problem-solving agent selects a cost function, which reflects
its performance measure.
o Initial state, actions, and transition model together define
the state-space of the problem implicitly.
o State-space of a problem is a set of all states which can be reached
from the initial state followed by any sequence of actions.
o The state-space forms a directed map or graph where nodes are
the states, links between the nodes are actions, and the path is a
sequence of states connected by the sequence of actions.
 Search:
o It identifies all the best possible sequence of actions to reach the
goal state from the current state. It takes a problem as an input
and returns solution as its output.
 Solution:
o It finds the best algorithm out of various algorithms, which may be
proven as the best optimal solution.
 Execution:
o It executes the best optimal solution from the searching algorithms
to reach the goal state from the current state.

6.3 Types of problem approaches:

6.3.1 Toy Problem or Standardized Problem:
o It is a concise and exact description of the problem which is
used by the researchers to compare the performance of
algorithms.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 35

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Example Problem 1 - 8 Puzzle Problem:

 Consider a 3x3 matrix with movable tiles numbered from 1 to 8 with a
blank space.
 The tile adjacent to the blank space can slide into that space.
 The objective is to reach a specified goal state similar to the goal state,
as shown in the below figure 1.11
 The task is to convert the current state into goal state by sliding digits
into the blank space

Figure 1.11: Agents and Environments

The problem formulation is as follows:

 States:
o It describes the location of each numbered tiles and the blank tile.
 Initial State:
o Can start from any state as the initial state.
 Actions:
o Actions of the blank space is defined, i.e., either left, right, up or
down
 Transition Model:
o It returns the resulting state as per the given state and actions.
 Goal test:
o It identifies whether have reached the correct goal-state.
 Path cost:
o The path cost is the number of steps in the path where the cost of
each step is 1.

Note: The 8-puzzle problem is a type of sliding-block problem which is

used for testing new search algorithms in artificial intelligence.

Example Problem 2 – Grid World Problem:

 A grid world problem is a two-dimensional rectangular array of
square cells in which agents can move from cell to cell.
 Typically the agent can move to any obstacle-free adjacent cell—
horizontally or vertically and in some problems diagonally.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 36

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 Cells can contain objects, which the agent can pick up, push, or
otherwise act upon; a wall or other impassible obstacle in acell
prevents an agent from moving into that cell.
 The vacuum world canbe formulated as a grid world problem as
shown in figure 1.12:

Figure 1.12: Vacuum World

The problem formulation is as follows:

 States:
o For the vacuum world, the objects are the agent and any dirt. In
the simple two-cell version, the agent can be in either of the two
cells, and each cell can either contain dirt or not.
 Initial state:
o Any state can be designated as the initial state.
 Actions:
o In the two-cell world three actions are: suck, move left, and move
Right, upward and downward, or forward, backward, turnright,
and turnleft.
 Transition model:
o Suck removes any dirt from the agent’s cell;
o Forward moves the agent ahead one cell in the direction it is
facing, unless it hits a wall, in which case the action has no effect.
o Backward moves the agent in the opposite direction
o Turnright and turnleft change the direction it is facing by.
 Goal states:
o The states in which every cell is clean.
 Action cost:
o Each action costs 1.
6.3.2 Real-world Problem:
 It is real-world based problems which require solutions.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 37

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Example Problem 1 – Route-Finding Problem:

 Route-finding problem is defined in terms of specified locations and
transitions along edges between them.
The problem formulation is as follows:
 Consider the airline travel problems that must be solved by a travel-
planning Web site:
• States:
o Each state obviously includes a location (e.g., an airport) and the
current time.
• Initial state:
o The user’s home airport.
• Actions:
o Take any flight from the current location, in any seat class, leaving
after the current time, leaving enough time for within-airport
transfer if needed.
• Transition model:
o The state resulting from taking a flight will have the flight’s
Destination as the new location and the flight’s arrival time as the
new time.
• Goal state:
o A destination city.
• Action cost:
o A combination of monetary cost, waiting time, flight time, customs
and Immigration procedures, seat quality, time of day, type of
airplane, frequent-flyer reward Points, and so on.

Algorithm – Problem solving agent

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 38

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

7. Explain in detail about search algorithms in detail with example.

SEARCH ALGORITHMS
7.1 Search Algorithm
7.2 Types of Search Algorithm
7.2.1 UNINFORMED SEARCH ALGORITHMS
7.2.1.1 Depth First Search
7.2.1.2 Depth Limited Search
7.2.1.3 Breadth First Search
7.2.1.4 Iterative Deepening Search
7.2.1.5 Uniform Cost Search
7.2.1.6 Bidirectional Search
7.2.2 INFORMED (HEURISTIC) SEARCH STRATEGIES
7.2.2.1 Greedy Search
7.2.2.2 A* Tree Search
7.2.2.3 A* Graph Search

7.1 Search Algorithm

o A search algorithm takes a search problem as input and returns a

solution.

7.2 Types of Search Algorithm

Figure 1.13: Types of Search Algorithms

7.2.1 UNINFORMED SEARCH ALGORITHMS:

• Uninformed search is also called Blind search.
• These algorithms can only generate the successors and
differentiate between the goal state and non goal state.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 39

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

• These type of search does not maintain any internal state, that’s
it is also known as Blind search.
• Types of uninformed search algorithms are shown in Figure 1.13.
7.2.1.1 Depth First Search
7.2.1.2 Depth Limited Search
7.2.1.3 Breadth First Search
7.2.1.4 Iterative Deepening Search
7.2.1.5 Uniform Cost Search
7.2.1.6 Bidirectional Search
Each of these algorithms will have:
 A problem graph, containing the start node S and the goal node
G.
 A strategy, describing the manner in which the graph will be
traversed to get to G.
 A fringe, which is a data structure used to store all the possible
states (nodes) from the current states.
 A tree, that results while traversing to the goal node.
 A solution plan, which the sequence of nodes from S to G.

7.2.1.1 DEPTH FIRST SEARCH:

• Depth-first search (DFS) is an algorithm for traversing or searching
tree or graph data structures.
• The algorithm starts at the root node and explores as far as possible
along each branch before backtracking.
• It uses last in-first-out strategy and hence it is implemented using a
stack.
d= the depth of the search tree = the number of levels of the search
tree.
n^i = number of nodes in level i
 DFS Example – Refer Figure 1.14

The performance measure of DFS

 Completeness: DFS does not guarantee to reach the goal state.
 Optimality: It does not give an optimal solution as it expands nodes
in one direction deeply.
 Space complexity: It needs to store only a single path from the root
node to the leaf node. Therefore, DFS has O(bm) space complexity
where b is the branching factor(i.e., total no. of child nodes, a
parent node have) and m is the maximum length of any path.
 Time complexity: DFS has O(bm) time complexity.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 40

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Figure 1.14: DFS Example

Advantage of Depth-first Search:

 DFS uses extremely little memory.
 It takes less time to reach the goal node than the BFS method.

Disadvantage of Depth-first Search:

 There's a chance that many states will recur, and there's no certainty
that a solution will be found.
 The DFS algorithm performs deep searching and may occasionally
enter an infinite cycle.

Example:

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 41

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Which solution would DFS find to move from node S to node G if

run on the graph below?

Solution.

Path: S -> A -> B -> C -> G

Algorithm

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 42

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

7.2.1.2 DEPTH-LIMITED SEARCH ALGORITHM:

 A depth-limited search algorithm is similar to depth-first search with a
predetermined limit.
 Depth-limited search can solve the drawback of the infinite path in
the Depth-first search.
 In this algorithm, the node at the depth limit will treat as it has no
successor nodes further.
 Depth-limited search can be terminated with two Conditions of failure:
o Standard failure value: It indicates that problem does not have any
solution.
o Cutoff failure value: It defines no solution for the problem within a
given depth limit.

Depth-limited search Algorithm

 Set a variable NODE to the initial state, i.e., the root node.
 Set a variable GOAL which contains the value of the goal state.
 Set a variable LIMIT which carries a depth-limit value.
 Loop each node by traversing in DFS manner till the depth-limit
value.
 While performing the looping, start removing the elements from the
stack in LIFO order.
 If the goal state is found, return goal state. Else terminate the
search.
 Example – Refer Figure 1.15

The performance measure of Depth-limited search

 Completeness: Depth-limited search does not guarantee to reach the
goal node.
 Optimality: It does not give an optimal solution as it expands the
nodes till the depth-limit.
 Space Complexity: The space complexity of the depth-limited search
is O(bl).
 Time Complexity: The time complexity of the depth-limited search
is O(bl).

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 43

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Example:

Figure 1.15 –Depth Limited Search

7.2.1.3 BREADTH FIRST SEARCH:

• Breadth-first search (BFS) is an algorithm for traversing or searching
tree or graph data structures.
• It starts at the tree root and explores all of the neighbor nodes at the
present depth prior to moving on to the nodes at the next depth level
Refer Figure 1.16.
• It is implemented using a queue.
S= the depth of the shallowest solution.

n^i= number of nodes in level .

 The performance measure of BFS is as follows:
o Completeness: It is a complete strategy as it definitely finds the goal
state.
o Optimality: It gives an optimal solution if the cost of each node is
same.
o Space Complexity: The space complexity of BFS is O(bd), i.e., it
requires a huge amount of memory. Here, b is the branching
factor and d denotes the depth/level of the tree
o Time Complexity: BFS consumes much time to reach the goal node
for large instances.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 44

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Figure 1.16: Breadth First Search

Breadth First Search Algorithm:

Advantages of Breadth-first Search:

 If a solution is available, BFS will provide it.
 If there are multiple answers to a problem, BFS will present the
simplest solution with the fewest steps.
Disadvantages of Breadth-first Search:
 It necessitates a large amount of memory.
 If the solution is located far from the root node, BFS will take a long
time.
Example:
Which solution would BFS find to move from node S to node G if
run on the graph below?

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 45

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Solution.

Path: S -> D -> G

7.2.1.4 ITERATIVE DEEPENING DEPTH-FIRST SEARCH/

ITERATIVE DEEPENING SEARCH
 This search is a combination of BFS and DFS, as BFS guarantees to
reach the goal node and DFS occupies less memory space.
 Therefore, iterative deepening search combines these two advantages of
BFS and DFS to reach the goal node.
 It gradually increases the depth-limit from 0,1,2 and so on and reach
the goal node.

Figure 1.17 – Iterative Deepening Search

 In the above figure 1.17, the goal node is H and initial

depth-limit =[0-1].
 So, it will expand level 0 and 1 and will terminate with
A->B->C sequence.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 46

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 Further, change the depth-limit =[0-3], it will again expand the nodes
from level 0 till level 3 and the search terminate
with A->B->D->F->E->H sequence where H is the desired goal node.

Iterative deepening search Algorithm

 Explore the nodes in DFS order.
 Set a LIMIT variable with a limit value.
 Loop each node up to the limit value and further increase the limit
value accordingly.
 Terminate the search when the goal state is found.

The performance measure of Iterative deepening search

 Completeness: Iterative deepening search may or may not reach the
goal state.
 Optimality: It does not give an optimal solution always.
 Space Complexity: It has the same space complexity as BFS,
i.e., O(bd).
 Time Complexity: It has O(d) time complexity.

Disadvantages of Iterative deepening search

 The drawback of iterative deepening search is that it seems wasteful
because it generates states multiple times.

7.2.1.5 UNIFORM COST SEARCH:

 A searching algorithm for traversing a weighted tree or graph is uniform-
cost search.
 When a separate cost is provided for each edge, this algorithm is used.
 The uniform-cost search's main purpose is to discover the shortest path
to the goal node with the lowest cumulative cost.
 Cost of a node is defined as:
cost(node) = cumulative cost of all nodes from root
cost(root) = 0
 Completeness: The uniform-cost search is complete, so if a solution
exists, UCS will discover it.
o Time Complexity: Let C* be the cost of the best solution, and be
the cost of each step toward the goal node. The number of steps is
then equal to C*/ε+1. We've chosen +1 because we're starting from
state 0 and ending at C*/ε.
o As a result, the Uniform-cost search's worst-case time complexity
is O(b1 + [C*/ε])/.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 47

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

o Space Complexity: The same argument applies to space complexity,

therefore Uniform-cost search's worst-case space complexity is O(b1 +

[C*/ε]).

o Optimal: Uniform-cost search is always the best option because it

only chooses the cheapest path.

Uniform Cost Search Algorithm

The performance measure of Uniform-cost search

 Completeness: It guarantees to reach the goal state.
 Optimality: It gives optimal path cost solution for the search.
 Space and time complexity: The worst space and time complexity of
the uniform-cost search is O(b1+LC*/??).

Advantages of Uniform-cost Search:

o The path with the lowest cost is chosen at each state.
o UCS is complete only if states are finite and there should be no loop
with zero weight.
o UCS is optimal only if there is no negative cost.
Disadvantages of Uniform-cost Search:
o It isjust concerned with the expense of the path.

Example:
Which solution would UCS find to move from node S to node G if run
on the graph below?

Solution.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 48

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Path: S -> A -> B -> G

Cost: 5

7.2.1.6 BIDIRECTIONAL SEARCH

 The strategy behind the bidirectional search is to run two searches
simultaneously--one forward search from the initial state and
other from the backside of the goal--hoping that both searches will
meet in the middle.
 As soon as the two searches intersect one another, the bidirectional
search terminates with the goal node.
 This search is implemented by replacing the goal test to check if the
two searches intersect. Because if they do so, it means a solution is
found.
The performance measure of Bidirectional search
 Complete: Bidirectional search is complete.
 Optimal: It gives an optimal solution.
 Time and space complexity: Bidirectional search has O(bd/2)
Disadvantage of Bidirectional Search
 It requires a lot of memory space.

7.2.2 INFORMED (HEURISTIC) SEARCH STRATEGIES:

 Here, the algorithms have information on the goal state, which helps in
more efficient searching.
 This information is obtained by something called a heuristic.

Search Heuristics:
 In an informed search, a heuristic is a functionh(n) estimates how
close a state is to the goal state.
 h(n) = estimated cost of the cheapest path from the state at node n to
a goal state.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 49

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Types of Informed search algorithms.

7.2.2.1 Greedy Search
7.2.2.2 A* Tree Search
7.2.2.3 A* Graph Search

7.2.2.1 GREEDY SEARCH:

 In greedy search, we expand the node closest to the goal node. The
“closeness” is estimated by a heuristic h(x).
 Heuristic: A heuristic h is defined as-
h(x) = Estimate of distance of node x from the goal node.
Lower the value of h(x), closer is the node from the goal.
 Strategy: Expand the node closest to the goal state, i.e. expand the
node with a lower h value.

The performance measure of Best-first search Algorithm:

 Completeness: Best-first search is incomplete even in finite state
space.
 Optimality: It does not provide an optimal solution.
 Time and Space complexity: It has O(bm) worst time and space
complexity, where m is the maximum depth of the search tree. If
the quality of the heuristic function is good, the complexities could be
reduced substantially.

 Advantage: Works well with informed search problems, with fewer

steps to reach a goal.
 Disadvantage: Can turn into unguided DFS in the worst case.

Example:
Question. Find the path from S to G using greedy search. The heuristic
values h of each node below the name of the node.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 50

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Path: S -> D -> E -> G

7.2.2.2 A* TREE SEARCH:

Heuristic:
The following points should be noted with heuristics in A* search.
f(x) = g(x) + h(x)
 h(x) is called the forward cost and is an estimate of the distance of
the current node from the goal node.
 g(x) is called the backward cost and is the cumulative cost of a
node from the root node.
 A* search is optimal only when for all nodes, the forward cost for a
node h(x) underestimates the actual cost h*(x) to reach the goal.
 This property of A* heuristic is called admissibility.
Admissibility: 0 <= h(x) <= h*(x)

Strategy: Choose the node with the lowest f(x) value.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 51

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

The performance measure of A* search

 Completeness: The star(*) in A* search guarantees to reach the
goal node.
 Optimality: An underestimated cost will always give an optimal
solution.
 Space and time complexity: A* search has O(bd) space and
time complexities.

Example: Find the path to reach from S to G using A* search.

Solution.
 Starting from S, the algorithm computes g(x) + h(x) for all nodes in the
fringe at each step, choosing the node with the lowest sum.
 The entire work is shown in the table below.

A* algorithm

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 52

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

7.2.2.3 A* Graph Search:

 In A* tree search, if the same node has expanded twice in different
branches of the search tree, A* search might explore both of those
branches, thus wasting time
 A* Graph Search, or simply Graph Search, removes this limitation by
adding this rule: do not expand the same node more than once.
 Heuristic. Graph search is optimal only when the forward cost between
two successive nodes A and B, given by h(A) – h (B), is less than or equal
to the backward cost between those two nodes g(A -> B). This property
of the graph search heuristic is called consistency.
 Consistency: h(A) – h(B) <= g ( A->B )

Example:
Question. Use graph searches to find paths from S to G in the following
graph.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 53

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Solution

Path: S -> D -> B -> E -> G

Cost: 7
8. Explain in detail about Local Search Algorithms with an example.

LOCAL SEARCH ALGORITHMS

8.1. Local Search Algorithm - Definition
8.2. Working of a Local search algorithm
8.3. Different types of local searches:
8.3.1 Hill-climbing Search
8.3.1.1 Simple hill climbing
8.3.1.2 Steepest-ascent hill climbing
8.3.1.3 Stochastic hill climbing
8.3.1.4 Random-restart hill climbing
8.3.2 Simulated Annealing
8.3.3. Local Beam Search

8.1. Local Search Algorithm - Definition

 Local search algorithms operate by searching from a start state to
neighboring states,without keeping track of the paths, or the set of states
that have been reached.
 “Local search algorithms” where the path cost does not matters, and
only focus on solution-state needed to reach the goal node.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 54

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 It has two key advantages:

o use very little memory;
o Often find reasonable solutions in large or infinite state spaces for
which systematic algorithms are unsuitable.
 Local search algorithms can also solve optimization problems, in which
the aim is to find the best state according to an objective function.
 Optimization Problems - An optimization problem is one where all the
nodes can give a solution. But the target is to find the best state out of all
according to the objective function.
 Objective Function - An objective function is a function whose value is
either minimized or maximized in different contexts of the optimization
problems. In the case of search algorithms, an objective function can be
the path cost for reaching the goal node, etc.

8.2. Working of a Local search algorithm

 Consider the below state-space landscape having both:
o Location: It is defined by the state.
o Elevation: It is defined by the value of the objective function or
heuristic cost function.

Figure 1.18 – Local Search Algorithm

The local search algorithm explores the above landscape by finding the
following two points:
 Global Minimum: If the elevation corresponds to the cost, then the task
is to find the lowest valley, which is known as Global Minimum.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 55

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 Global Maxima: If the elevation corresponds to an objective function,

then it finds the highest peak which is called as Global Maxima. It is the
highest point in the valley. Refer fig 1.18

8.3 Different types of local searches:

8.3.1 Hill-climbing Search
8.3.2 Simulated Annealing
8.3.3 Local Beam Search

8.3.1 Hill-climbing Search

 Hill Climbing Algorithm: Hill climbing search is a local search problem.
 The hill-climbing search algorithm keeps track of one current state and
on each iteration moves to the neighboring state with highest value.
 The purpose of the hill climbing search is to climb a hill and reach the
topmost peak/ point of that hill.
 It is based on the heuristic search technique where the person who is
climbing up on the hill estimates the direction which will lead him to the
highest peak.

State-space Landscape of Hill climbing algorithm

 Consider the below landscape representing the goal state/peak and
the current state of the climber.

Figure 1.19 – Local Search Algorithm

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 56

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 The topographical regions shown in the Figure 1.19 can be defined as:
 Global Maximum: It is the highest point on the hill, which is the goal
state.
 Local Maximum: It is the peak higher than all other peaks but lower
than the global maximum.
 Flat local maximum: It is the flat area over the hill where it has no
uphill or downhill. It is a saturated point of the hill.
 Shoulder: It is also a flat area where the summit is possible.
 Current state: It is the current position of the person.

Types of Hill climbing search algorithm

8.3.1.1 Simple hill climbing
8.3.1.2 Steepest-ascent hill climbing
8.3.1.3 Stochastic hill climbing
8.3.1.4 Random-restart hill climbing
8.3.1.1 Simple hill climbing search
 Simple hill climbing is the simplest technique to climb a hill.
 The task is to reach the highest peak of the mountain.
 Here, the movement of the climber depends on his move/steps.
 If he finds his next step better than the previous one, he continues to
move else remain in the same state.
 This search focus only on his previous and next step.
Simple hill climbing Algorithm
1. Create a CURRENT node, NEIGHBOUR node, and a GOAL node.
2. If the CURRENT node=GOAL node, return GOAL and terminate the
search.
3. Else CURRENT node<= NEIGHBOUR node, move ahead.
4. Loop until the goal is not reached or a point is not found.

8.3.1.2 Steepest-ascent hill climbing

 Steepest-ascent hill climbing is different from simple hill climbing
search. Unlike simple hill climbing search,
 It considers all the successive nodes, compares them, and choose the
node which is closest to the solution.
 Steepest hill climbing search is similar to best-first search because it
focuses on each node instead of one.
Steepest-ascent hill climbing algorithm
1. Create a CURRENT node and a GOAL node.
2. If the CURRENT node=GOAL node, return GOAL and terminate the
search.
3. Loop until a better node is not found to reach the solution.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 57

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

4. If there is any better successor node present, expand it.

5. When the GOAL is attained, return GOAL and terminate.

8.3.1.3 Stochastic hill climbing

 Stochastic hill climbing does not focus on all the nodes.
 It selects one node at random and decides whether it should be
expanded or search for a better one.

8.3.1.4 Random-restart hill climbing

 Random-restart algorithm is based on try and try strategy.
 It iteratively searches the node and selects the best one at each
step until the goal is not found.
 The success depends most commonly on the shape of the hill.
 If there are few plateaus, local maxima, and ridges, it becomes
easy to reach the destination.
Limitations of Hill climbing algorithm
 Local Maxima: It is that peak of the mountain which is highest than all
its neighboring states but lower than the global maxima. It is not the goal
peak because there is another peak higher than it. Refer Figure 1.20.

Figure 1.20 – Local Maxima

 Plateau: It is a flat surface area where no uphill exists. It becomes

difficult for the climber to decide that in which direction he should move
to reach the goal point.Refer Figure 1.21.

Figure 1.21 – Plateau

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 58

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 Ridges: It is a challenging problem where the person finds two or more

local maxima of the same height commonly. It becomes difficult for the
person to navigate the right point and stuck to that point itself. Refer
Figure 1.22

Figure 1.22 – Ridges

8.3.2 Simulated Annealing

 Simulated annealing is similar to the hill climbing algorithm.
 It works on the current situation.
 It picks a random move instead of picking the best move.
 If the move leads to the improvement of the current situation, it is
always accepted as a step towards the solution state, else it accepts
the move having a probability less than 1.
 This search technique was first used in 1980 to solve VLSI
layout problems.
 It is also applied for factory scheduling and other large optimization
tasks.

8.3.3.Local Beam Search

 The local beam search algorithm keeps track of states rather than just
one.
 It begins with randomly k generated states.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 59

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 It selects k randomly generated states, and expands them at each

step.
 If any state is a goal state, the search stops with success.
 Else it selects the best k successors from the complete list and repeats
the same process.
 In local beam search, the necessary information is shared between the
parallel search processes.
Disadvantages of Local Beam search
 This search can suffer from a lack of diversity among the k states.
 It is an expensive version of hill climbing search.

9. Explain in detail about Adversarial Search and its types in Artificial

Intelligence.

ADVERSARIAL SEARCH
9.1 Adversarial Search in Artificial Intelligence
9.2 Elements of Game Playing search
9.3 Example:
9.3.1 Game Tree For Tic Tac Toe
9.4 Types Of Algorithms In Adversarial Search
9.4.1 Minimax Strategy / Minimax Algorithm
9.4.2 Alpha – Beta Pruning

9.1 Adversarial Search in Artificial Intelligence

AI Adversarial search:
 Adversarial search is a game-playing technique where the agents are
surrounded by a competitive environment.
 A conflicting goal is given to the agents (multiagent).
 These agents compete with one another and try to defeat one another in
order to win the game.

 Such conflicting goals give rise to the adversarial search.

9.2 Elements of Game Playing search

 To play a game, we use a game tree to know all the possible choices and
to pick the best one out.
 There are following elements of a game-playing:
 S0: It is the initial state from where a game begins.
 PLAYER (s): It defines which player is having the current turn to
make a move in the state.
 ACTIONS (s): It defines the set of legal moves to be used in a state.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 60

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 RESULT (s, a): It is a transition model which defines the result of a

move.
 TERMINAL-TEST (s): It defines that the game has ended and returns
true.
 UTILITY (s,p): It defines the final value with which the game has
ended. This function is also known as Objective function or Payoff
function.
 The price which the winner will get i.e.
o (-1): If the PLAYER loses.
o (+1): If the PLAYER wins.
o (0): If there is a draw between the PLAYERS.

9.3 Example:
 Game tic-tac-toe, has two or three possible outcomes.
 Either to win, to lose, or to draw the match with values +1,-1 or 0.
 Game tree designed for tic-tac-toe refer Figure 1.23.
 Here, the node represents the game state and edges represent the moves
taken by the players.

Figure 1.23 – A Game Tree for Tic Tac Toe

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 61

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

9.3.1 GAME TREE FOR TIC TAC TOE

 INITIAL STATE (S0): The top node in the game-tree represents the initial
state in the tree and shows all the possible choice to pick out one.
 PLAYER (s): There are two players, MAX and MIN. MAX begins the game
by picking one best move and place X in the empty square box.
 ACTIONS (s): Both the players can make moves in the empty boxes
chance by chance.
 RESULT (s, a): The moves made by MIN and MAX will decide the
outcome of the game.
 TERMINAL-TEST(s): When all the empty boxes will be filled, it will be the
terminating state of the game.
 UTILITY: At the end, we will get to know who wins: MAX or MIN, and
accordingly, the price will be given to them.

9.4 TYPES OF ALGORITHMS IN ADVERSARIAL SEARCH

9.4.1 Minimax Algorithm
9.4.2 Alpha-beta Pruning

9.4.1 MINIMAX STRATEGY / MINIMAX ALGORITHM

 In artificial intelligence, minimax is a decision-making strategy
under game theory, which is used to minimize the losing chances in
a game and to maximize the winning chances.
 This strategy is also known as ‘Minmax,’ ’MM,’ or ‘Saddle point.’
 Basically, it is a two-player game strategy where if one wins, the other
loose the game.
 Example: playing chess
 MINIMAX algorithm is a backtracking algorithm where it backtracks
to pick the best move out of several choices.
 MINIMAX strategy follows the DFS (Depth-first search) concept.
 Here, we have two players MIN and MAX, and the game is played
alternatively between them, i.e., when MAX made a move, then the
next turn is of MIN.

Game Tree
 A game tree is where the nodes represent the states of the game and
edges represent the moves made by the players in the game.
 Players will be two namely:
o MIN: Decrease the chances to win the game.
o MAX: Increases his chances of winning the game.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 62

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 MAX will choose that path which will increase its utility value and MIN
will choose the opposite path which could help it to minimize MAX’s
utility value.

Working of Min-Max Algorithm:

o Consider an example of game-tree which is representing the two-
player game.
o In this example, there are two players one is called Maximizer and
other is called Minimizer.
o Maximizer will try to get the Maximum possible score, and Minimizer
will try to get the minimum possible score.
o This algorithm applies DFS, so have to go all the way through the
leaves to reach the terminal nodes.
o At the terminal node, the terminal values are given so we will compare
those value and backtrack the tree until the initial state occurs.

Steps involved in solving the two-player game tree:

Step-1:
Let's take A is the initial state of the tree. Suppose maximizer takes
first turn which has worst-case initial value =- infinity, and minimizer
will take next turn which has worst-case initial value = +infinity.

Step 2:
Now, first we find the utilities value for the Maximizer, its initial value
is -∞,
For node D max(-1,- -∞) => max(-1,4)= 4
o For Node E max(2, -∞) => max(2, 6)= 6
o For Node F max(-3, -∞) => max(-3,-5) = -3
o For node G max(0, -∞) = max(0, 7) = 7

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 63

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Step 3:
In the next step, it's a turn for minimizer, so it will compare all nodes
value with +∞, and will find the 3 rd layer node values.
o For node B= min(4,6) = 4
o For node C= min (-3, 7) = -3

Step 4:
Now it's a turn for Maximizer, and it will again choose the maximum of all
nodes value and find the maximum value for the root node.
In this game tree, there are only 4 layers, hence reach immediately to the
root node, but in real games, there will be more than 4 layers.
o For node A max(4, -3)= 4

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 64

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

That was the complete workflow of the minimax two player game.

Minimax algorithm

9.4.2 ALPHA – BETA PRUNING

 Alpha-beta pruning is an advance version of MINIMAX algorithm.
Drawbacks of MINIMAX
 The drawback of minimax strategy is that it explores each node in the
tree deeply to provide the best path among all the paths.
 This increases its time complexity.
 Therefore, alpha-beta pruning reduces this drawback of minimax
strategy by less exploring the nodes of the search tree.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 65

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 The method used in alpha-beta pruning is that it cutoff the

search by exploring less number of nodes. It prunes the unwanted
branches using the pruning technique (discussed in adversarial
search).
 Alpha-beta pruning works on two threshold values, i.e., ?
(alpha) and ? (beta).
 ?: (alpha) - It is the best highest value, a MAX player can have. It
is the lower bound, which represents negative infinity value.
 ?:(beta) - It is the best lowest value, a MIN player can have. It is
the upper bound which represents positive infinity.
 The main condition which required for alpha-beta pruning is:
α>=β
Working of Alpha-beta Pruning
 Let's take an example of two-player search tree to understand the
working of Alpha-beta pruning
 here, ? (alpha) will represent the maximum value of the nodes,
and ?(beta) will represent the minimum value of the nodes.

Step 1:
At the first step the, Max player will start first move from node A
where α= -∞ and β= +∞, these value of alpha and beta passed down to
node B where again α= -∞ and β= +∞, and Node B passes the same
value to its child D.

Step 2:
At Node D, the value of α will be calculated as its turn for Max. The
value of α is compared with firstly 2 and then 3, and the max (2, 3) = 3
will be the value of α at node D and node value will also 3.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 66

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Step 3:
Now algorithm backtrack to node B, where the value of β will change
as this is a turn of Min, Now β= +∞, will compare with the available
subsequent nodes value, i.e. min (∞, 3) = 3, hence at node B now
α= -∞, and β= 3.

In the next step, algorithm traverse the next successor of Node B

which is node E, and the values of α= -∞, and β= 3 will also be passed.

Step 4:
At node E, Max will take its turn, and the value of alpha will change.
The current value of alpha will be compared with 5, so max (-∞, 5) = 5,
hence at node E α= 5 and β= 3, where α>=β, so the right successor of
E will be pruned, and algorithm will not traverse it, and the value at
node E will be 5.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 67

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Step 5:
At next step, algorithm again backtrack the tree, from node B to node
A. At node A, the value of alpha will be changed the maximum
available value is 3 as max (-∞, 3)= 3, and β= +∞, these two values
now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to
node F.
Step 6:
At node F, again the value of α will be compared with left child which
is 0, and max(3,0)= 3, and then compared with right child which is 1,
and max(3,1)= 3 still α remains 3, but the node value of F will become
1.

Step 7:
Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here
the value of beta will be changed, it will compare with 1 so min (∞, 1) =
1. Now at C, α=3 and β= 1, and again it satisfies the condition α>=β,
so the next child of C which is G will be pruned, and the algorithm will
not compute the entire sub-tree G.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 68

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Step 8:
C now returns the value of 1 to A here the best value for A is max
(3, 1) = 3.
Following is the final game tree which is the showing the nodes which
are computed and nodes which has never computed.
Hence the optimal value for the maximizer is 3 for this example.

 The game will be started from the last level of the game tree,
and the value will be chosen accordingly.
The rule which will be followed is: “Explore nodes if necessary
otherwise prune the unnecessary nodes.”

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 69

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

10. Explain in detail about constraint satisfaction problems (CSP) with an

Suitable example.

Constraint Satisfaction Problems (CSP)

10.1 Constraint satisfaction problems (CSP) – Definition
10.2 CSP Algorithms and Problems
10.3 Map Coloring / Graph Coloring Problem

10.1 Constraint satisfaction problems (CSP) – Definition

 A constraint satisfaction problem (CSP) is a problem that requires its
solution within some limitations or conditions also known as
constraints.
 It consists of the following:
 A finite set of variables which stores the solution (V = {V1, V2,
V3,....., Vn})
 A set of discrete values known as domain from which the solution
is picked (D = {D1, D2, D3,.....,Dn})
 A finite set of constraints (C = {C1, C2, C3,......, Cn})

10.2 CSP Algorithms and Problems

Solving Constraint Satisfaction Problems

The requirements to solve a constraint satisfaction problem (CSP) is:
 A state-space
 The notion of the solution.
A state in state-space is defined by assigning values to some or all
variables such as {X1=v1, X2=v2, and so on…}.

An assignment of values to a variable can be done in three ways:

 Consistent or Legal Assignment: An assignment which does not
violate any constraint or rule is called Consistent or legal assignment.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 70

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

 Complete Assignment: An assignment where every variable is

assigned with a value, and the solution to the CSP remains consistent.
Such assignment is known as Complete assignment.
 Partial Assignment: An assignment which assigns values to some of
the variables only. Such type of assignments are called Partial
assignments.

Types of Domains in CSP

Constraint Types in CSP

10.3 Map Coloring / Graph Coloring Problem

Solution:
 It is a map of Australia; it consists of states and territories as in Figure
1.24.
 The task will be coloring each region with the colors red, green or blue.
 In such case, no neighboring regions will have the same color.
 The set of the domain in each variable is

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 71

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Figure 1.24– States and Territories of Australia

 Formulate the variables to the particular regions as shown in figure 1.25:

Figure 1.25– Map Coloring of Australia

 The constraints require neighboring regions to have distinct colors.

 Since there are nine places where regions border, there are nine
constraints:

Coloring a map with three colors:

 If it starts coloring with the state SA, then any of the 3 colors can be
chosen.
 Means, here there are 3 possibilities.
 And when moving to the next node, it says WA and there are 2 colors that
can be chosen. That means it has 2 possibilities.
 The remaining states can be colored with the colors carefully.
 And there will be possibility chance of choosing only one particular color
that does not match the neighboring state’s color.
 So the number of possibilities is 3 x 2 = 6.
 As the state T is not connected, it can be mapped with any of the 3 colors.
There will be 3 possibilities to color this state.
 So, there can be 6 x 3 = 18 solutions that can be obtained by coloring
each of the six possibilities with 3 times with 3 colors to the state TA.
 So the number of solutions is 6 x 3 = 18.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 72

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 1

Therefore, there will be 18 solutions for coloring a map with three

colors.

Figure 1.26 – Map Coloring Solution

There are many possible solutions to this problem, such as in figure 1.26

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 73

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

MAILAM ENGINEERING COLLEGE

Mailam – 604 304
(Approved by AICTE-New Delhi, Affiliated to Anna University-Chennai,
Accredited by National Board of Accreditation-New Delhi & TCS-Chennai)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
II YEAR / IV SEM
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
UNIT II PROBABILISTIC REASONING

SYLLABUS:
Acting under uncertainty – Bayesian inference – Naïve Bayes
models. Probabilistic reasoning – Bayesian networks – exact
inference in BN – approximate inference in BN – causal networks.

PART A
1. Define uncertainty and list the causes of uncertainty.
Uncertainty:
 The knowledge representation, A→B, means if A is true then B is true,
but a situation where not sure about whether A is true or not then cannot
express this statement, this situation is called uncertainty.
 So to represent uncertain knowledge, uncertain reasoning or probabilistic
reasoning is used.
Causes of uncertainty:
1. Causes of uncertainty in the real world
2. Information occurred from unreliable sources.
3. Experimental Errors
4. Equipment fault
5. Temperature variation
6. Climate change.

2. Define Probabilistic reasoning. Mention the need of probabilistic

reasoning in AI
Probabilistic reasoning:
 Probabilistic reasoning is a way of knowledge representation, the
concept of probability is applied to indicate the uncertainty in
knowledge.
Need of probabilistic reasoning in AI:
 When there are unpredictable outcomes.
 When specifications or possibilities of predicates becomes too large to
handle.
 When an unknown error occurs during an experiment.

1
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

3. List the Ways to solve problems with uncertain knowledge.

 Bayes' rule
 Bayesian Statistics

4. Define Probability and the probability of occurrence.

 Probability can be defined as a chance that an uncertain event will
occur.
 The value of probability always remains between 0 and 1 that
represent ideal uncertainties.
o 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
o P(A) = 0, indicates total uncertainty in an event A.
o P(A) =1, indicates total certainty in an event A.
 Formula to find the probability of an uncertain event

P(¬A) = probability of a not happening event.

(¬A) + P(A) = 1.

5. Define the terms event, sample space, random variables, prior

probability and posterior probability.
 Event: Each possible outcome of a variable is called an event.
 Sample space: The collection of all possible events is called sample
space.
 Random variables: Random variables are used to represent the
events and objects in the real world.
 Prior probability: The prior probability of an event is probability
computed before observing new information.
 Posterior Probability: The probability that is calculated after all
evidence or information has taken into account. It is a combination of
prior probability and new information.

6. Define Conditional probability.

 Conditional probability is a probability of occurring an event when
another event has already happened.
 Let's suppose, to calculate the event A when event B has already
occurred, "the probability of A under the conditions of B", it is:

2
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

7. In a class, there are 70% of the students who like English and 40%
of the students who likes English and mathematics, and then what
is the percent of students those who like English also like
mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics

8. Define Bayesian Inference.

 Bayesian inference is a probabilistic approach to machine learning that
provides estimates of the probability of specific events.
 Bayesian inference is a statistical method for understanding the
uncertainty inherent in prediction problems.
 Bayesian inference algorithm can be viewed as a Markov Chain Monte
Carlo algorithm that uses prior probability distributions to optimize the
likelihood function.

9. List Bayes Theorem or Bayes Rule

 Bayes' theorem can be derived using product rule and conditional
probability of event A with known event B:
 Product Rule:
1. P(A ⋀ B)= P(A|B) P(B) or
2. P(A ⋀ B)= P(B|A) P(A)
 Conditional Probability:
 Let A and B are events,
 P(A|B) is the conditional probability of A given B,
 P(B|A) is the conditional probability of B given A.
 Equating right hand side of both the equations will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem.

This equation is basic of most modern AI systems for probabilistic
inference.

3
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

 P(A|B) is known as posterior, is the Probability of hypothesis A

when occurred an evidence B.
 P(B|A) is called the likelihood, in which hypothesis is true, then
calculate the probability of evidence.
 P(A) is called the prior probability, probability of hypothesis before
considering the evidence
 P(B) is called marginal probability, pure probability of an
evidence.

10. Suppose we want to perceive the effect of some unknown cause,

and want to compute that cause, then the Bayes' rule becomes:

what is the probability that a patient has diseases meningitis

with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have
a stiff neck, and it occurs 80% of the time. He is also aware of
some more facts, which are given as follows:
o The Known probability that a patient has meningitis disease is
1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
Solution
Let a be the proposition that patient has stiff neck and b be the
proposition that patient has meningitis.
So, calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, assume that 1 patient out of 750 patients has meningitis

disease with a stiff neck.

11. Consider two events: A (it will rain tomorrow) and B (the sun will
shine tomorrow).
 Use Bayes’ theorem to compute the posterior probability of each event
occurring, given the resulting weather conditions for today:
P(A|sunny) = P(sunny|A) * P(A) / P(sunny)

4
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

P(B|sunny) = P(sunny|B) * P(B) / P(sunny)

where sunny is our evidence (the resulting weather condition for
today).

12. What are the Application of Bayes' theorem in Artificial

intelligence?
 It is used to calculate the next step of the robot when the already
executed step is given.
 Bayes' theorem is helpful in weather forecasting.

13. Define Bayesian Network.

14. Define Joint probability distribution.

 If variables are x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability
distribution.
 P[x 1, x 2, x 3,.....,xn], can be written as the following way in terms of the
joint probability distribution.
= P[x 1| x 2, x 3,....., xn]. p[x 2, x 3,....., xn]
= P[x 1| x 2, x 3,....., xn]P[x 2|x 3,....., xn]....P[xn-1|x n]P[xn].
 In general for each variable Xi,
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

15. Write an algorithm for Constructing Bayesian Network

5
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

16. Define Global semantics and local semantics.

Global Semantics

Local Semantics

17. List the ways to understand the semantics of Bayesian Network

There are two ways to understand the semantics of the Bayesian
network, which is given below:
1. To understand the network as the representation of the Joint
probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of
conditional independence statements.
It is helpful in designing inference procedure.

18. What are the Applications of Bayesian networks in AI?

1. Spam filtering
2. Bio monitoring
3. Information retrieval
4. Image processing
5. Gene regulatory network
6. Turbo code
7. Document classification

19. Define Bayesian Inference.

 Bayesian Network is to perform inference, which computes the
marginal probability P(V=v) for each node V and each possible
instantiation v.
 Inference can also be done on a Bayesian network when the values of
some nodes are known (as evidence) and wish to compute the
likelihood of values of other nodes.
 There are two types of inference on Bayesian networks: exact and
approximate.
 Exact inference algorithms compute the exact values of each marginal
or posterior probability, while approximate inference algorithms
sacrifice some accuracy of the probabilities to report results quickly.

6
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

20. Define Exact Inference.

 The goal of an exact inference algorithm is to report the exact values
for either the marginal (P(V = v)) or posterior probabilities (P(V = v|e))
for each instantiation v of each node V, possible given some evidence e
of other node values.

21. List the common exact inference algorithms –

 Pearl’s algorithm
 Lauritzen-Spiegelhalter algorithm.

22. Define Pearl’s Algorithm.

 Pearl’s algorithm is a linear-time algorithm that computes the
posterior probabilities of each node given evidence of singly-connected
networks.
 Pearl introduced the notation λ (X = x) for the diagnostic support of a
node X with value x, which is the probability of evidence below X given
that X = x.

23. Define Lauritzen-Spiegelhalter (LS) Algorithm.

 The Lauritzen-Spiegelhalter (LS) algorithm is an inference algorithm
for Bayesian networks that works on all models.

24. Define Causal Network or Causal Bayesian Network

 A causal network is an acyclic digraph arising from an evolution of
a substitution system, and representing its history.
 In an evolution of a multiway system, each substitution event is a
vertex in a causal network.
 Two events which are related by causal dependence, meaning one
occurs just before the other, have an edge between the corresponding
vertices in the causal network.
 More precisely, the edge is a directed edge leading from the past event
to the future event.

25. Define Structural Causal Models (SCMs).

 SCMs consist of two parts: a graph, which visualizes causal
connections, and equations, which express the details of the
connections. a graph is a mathematical construction that consists
of vertices (nodes) and edges (links).
 SCMs use a special kind of graph, called a Directed Acyclic Graph
(DAG), for which all edges are directed and no cycles exist.
 DAGs are a common starting place for causal inference.

7
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

26. List the purpose of do-operator in causal networks.

 The do-operator is a mathematical representation of a physical
intervention.
 If the model starts with Z → X → Y, simulate an intervention in X by
deleting all the incoming arrows to X, and manually setting X to some
value x_0.

27. List the rules of Do-Calculus.

8
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

PART B

1. Explain the concept of uncertainty and acting under uncertainty with

suitable example. Explain in detail about probabilistic reasoning.

UNCERTAINITY & PROBABILISTIC REASONING

1.1 Uncertainty:
1.1.1Causes of uncertainty
1.2 Probabilistic reasoning:
1.2.1 Need of probabilistic reasoning in AI
1.2.2 Ways to solve problems with uncertain
knowledge
1.2.3 Probability
1.2.4 Conditional probability
1.2.4.1 Example

Agents almost never have access to the whole truth about their environment.
Agents must, therefore, act under uncertainty.

Handling uncertain knowledge

In this section, we look more closely at the nature of uncertain knowledge.
We will use a simple diagnosis example to illustrate the concepts involved.
Diagnosis whether for medicine, automobile repair, or whatever-is a task that
almost always involves uncertainty. Let us try to write rules for dental diagnosis
using first-order logic, so that we can see how the logical approach breaks down.
Consider the following rule:

The problem is that this rule is wrong. Not all patients with toothaches have
cavities; some of them have gum disease, an abscess, or one of several other
problems:

Unfortunately, in order to make the rule true, we have to add an almost unlimited
list of possible causes. We could try turning the rule into a causal rule:

9
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

But this rule is not right either; not all cavities cause pain The only way to fix the
rule is to make it logically exhaustive: to augment the left-hand side with all the
qualifications required for a cavity to cause a toothache. Even then, for the
purposes of diagnosis, one must also take into account the possibility that the
patient might have a toothache and a cavity that are unconnected. Trying to use
first-order logic to cope with a domain like medical diagnosis thus fails for three
main reasons:

0 Laziness: It is too much work to list the complete set of antecedents or

consequents needed to ensure an exceptionless rule and too hard to use such
rules.
0 Theoretical ignorance: Medical science has no complete theory for the domain.
0 Practical ignorance: Even if we know all the rules, we might be uncertain about
a particular patient because not all the necessary tests have been or can be run.

The connection between toothaches and cavities is just not a logical

consequence in either direction. This is typical of the medical domain, as well as
most other judgmental domains: law, business, design, automobile repair,
gardening, dating, and so on. The agent's knowledge can at best provide only a
degree of belief in the relevant sentences. Our main tool for dealing with degrees of
belief will be probability theory, which assigns to each sentence a numerical
degree of belief between 0 and 1.

Probability provides a way of summarizing the uncertainty that comes from

our laziness and ignorance. We might not know for sure what afflicts a particular
patient, but we believe that there is, say, an 80% chance-that is, a probability of
0.8-that the patient has a cavity if he or she has a toothache.

That is, we expect that out of all the situations that are indistinguishable
from the current situation as far as the agent's knowledge goes, the patient will
have a cavity in 80% of them. This belief could be derived from statistical data-80%
of the toothache patients seen so far have had cavities-or from some general rules,
or from a combination of evidence sources.

The 80% summarizes those cases in which all the factors needed for a cavity
to cause a toothache are present and other cases in which the patient has both
toothache and cavity but the two are unconnected. The missing 20% summarizes
all the other possible causes of toothache that we are too lazy or ignorant to confirm
or deny.

10
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

Design for a decision-theoretic agent

Below algorithm sketches the structure of an agent that uses decision theory
to select actions. The agent is identical, at an abstract level, to the logical agent.
The primary difference is that the decision-theoretic agent's knowledge of the
current state is uncertain; the agent's belief state is a representation of the
probabilities of all possible actual states of the world. As time passes, the agent
accumulates more evidence and its belief state changes. Given the belief state, the
agent can make probabilistic predictions of action outcomes and hence select the
action with highest expected utility.

1.1 Uncertainty:
 The knowledge representation, A→B, means if A is true then B is true,
but a situation where not sure about whether A is true or not then
cannot express this statement, this situation is called uncertainty.
 So to represent uncertain knowledge, uncertain reasoning or
probabilistic reasoning is used.

1.1.1Causes of uncertainty:
Causes of uncertainty in the real world
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.

1.2 Probabilistic reasoning:

 Probabilistic reasoning is a way of knowledge representation, the
concept of probability is applied to indicate the uncertainty in
knowledge.

1.2.1 Need of probabilistic reasoning in AI:

o When there are unpredictable outcomes.

11
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

o When specifications or possibilities of predicates becomes too

large to handle.
o When an unknown error occurs during an experiment.
1.2.2 Ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics

1.2.3 Probability:
 Probability can be defined as a chance that an uncertain event
will occur.
 The value of probability always remains between 0 and 1 that
represent ideal uncertainties.
o 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
o P(A) = 0, indicates total uncertainty in an event A.
o P(A) =1, indicates total certainty in an event A.
 Formula to find the probability of an uncertain event

P(¬A) = probability of a not happening event.

P(¬A) + P(A) = 1.

o Event: Each possible outcome of a variable is called an event.

o Sample space: The collection of all possible events is called
sample space.
o Random variables: Random variables are used to represent the
events and objects in the real world.
o Prior probability: The prior probability of an event is
probability computed before observing new information.
o Posterior Probability: The probability that is calculated after
all evidence or information has taken into account. It is a
combination of prior probability and new information.

1.2.4 Conditional probability:

 Conditional probability is a probability of occurring an event when
another event has already happened.

12
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

 Let's suppose, to calculate the event A when event B has already

occurred, "the probability of A under the conditions of B", it is:

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

 If the probability of A is given and to find the probability of B, then it

is:

1.2.4.1 Example:
In a class, there are 70% of the students who like English and 40% of
the students who likes English and mathematics, and then what is
the percent of students those who like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics

2. Explain in detail about Bayesian inference and Naive Bayes Model or Naive
Bayes Theorem or Bayes Rule.

Naive Bayes Model or Naive Bayes Theorem or Bayes Rule

2.1 Bayesian Inference
2.2 Bayes Theorem or Bayes Rule
2.3 Example - Applying Bayes' rule:
2.4 Application of Bayes' theorem in Artificial intelligence

2.1 Bayesian Inference

13
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

 Bayesian inference algorithm can be viewed as a Markov Chain Monte

Carlo algorithm that uses prior probability distributions to optimize the
likelihood function.
 The basis of Bayesian inference is the notion of apriori and a posteriori
probabilities.
o The priori probability is the probability of an event before any
evidence is considered.
o The posteriori probability is the probability of an event after
taking into account all available evidence.
 For example, if we want to know the probability that it will rain
tomorrow, our priori probability would be based on our knowledge of
the weather patterns in our area.

2.2 Bayes Theorem or Bayes Rule

 Bayes' theorem can be derived using product rule and conditional
probability of event A with known event B:
 Product Rule:
3. P(A ⋀ B)= P(A|B) P(B) or
4. P(A ⋀ B)= P(B|A) P(A)
 Conditional Probability:
 Let A and B are events,
 P(A|B) is the conditional probability of A given B,
 P(B|A) is the conditional probability of B given A.
 Equating right hand side of both the equations will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem.

This equation is basic of most modern AI systems for probabilistic
inference.
 P(A|B) is known as posterior, is the Probability of hypothesis A
when occurred an evidence B.
 P(B|A) is called the likelihood, in which hypothesis is true, then
calculate the probability of evidence.
 P(A) is called the prior probability, probability of hypothesis before
considering the evidence
 P(B) is called marginal probability, pure probability of an
evidence.
 In general,
P (B) = P(A)*P(B|Ai),

14
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

 Hence the Bayes' rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and

exhaustive events.

2.3 Example 1 - Applying Bayes' rule:

Suppose we want to perceive the effect of some unknown
cause, and want to compute that cause, then the Bayes' rule
becomes:

what is the probability that a patient has diseases meningitis

with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to
have a stiff neck, and it occurs 80% of the time. He is also
aware of some more facts, which are given as follows:
The Known probability that a patient has meningitis disease
is 1/30,000.
The Known probability that a patient has a stiff neck is 2%.
Solution
Let a be the proposition that patient has stiff neck and b be the
proposition that patient has meningitis.
So, calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, assume that 1 patient out of 750 patients has meningitis

disease with a stiff neck.

15
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

Example 2 - Applying Bayes' rule:

 Consider two events: A (it will rain tomorrow) and B (the sun will shine
tomorrow).
 Use Bayes’ theorem to compute the posterior probability of each event
occurring, given the resulting weather conditions for today:
P(A|sunny) = P(sunny|A) * P(A) / P(sunny)
P(B|sunny) = P(sunny|B) * P(B) / P(sunny)
where sunny is our evidence (the resulting weather condition for
today).
 From these equations,
o if event A is more likely to result in sunny weather than event B,
then the posterior probability of A occurring, given that the
resulting weather condition for today is sunny, will be higher than
the posterior probability of B occurring.
o Conversely, if event B is more likely to result in sunny weather than
event A, then the posterior probability of B occurring, given that the
resulting weather condition for today is sunny, will be higher than
the posterior probability of A occurring.

2.4 Application of Bayes' theorem in Artificial intelligence:

 It is used to calculate the next step of the robot when the already
executed step is given.
 Bayes' theorem is helpful in weather forecasting.

Naive Bayes Theorem

The dentistry example illustrates a commonly occurring pattern in which a single
cause directly influences a number of effects, all of which are conditionally
independent, given the cause. The full joint distribution can be written as

Such a probability distribution is called a naive Bayes model—“naive” because it is

often used (as a simplifying assumption) in cases where the “effect” variables are not
strictly independent given the cause variable. (The naive Bayes model is sometimes
called a Bayesian classifier, a somewhat careless usage that has prompted true
Bayesians to call it the idiot Bayes model.) In practice, naive Bayes systems often
work very well, even when the conditional independence assumption is not strictly
true

16
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

3.Explain in detail about Bayesian Network

Bayesian Network
3.1 Bayesian Network
3.2 Joint probability distribution:
3.3 Constructing Bayesian Network
3.4 Example
3.5 The semantics of Bayesian Network
3.6 Applications of Bayesian networks in AI

3.1 Bayesian Network

 "A Bayesian network is a probabilistic graphical model which represents a
set of variables and their conditional dependencies using a directed
acyclic graph."
 It is also called a Bayes network, belief network, decision network,
or Bayesian model.
 Bayesian Network can be used for building models from data and experts
opinions,and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities
 The generalized form of Bayesian network that represents and solve
decisionproblems under uncertain knowledge is known as an Influence
diagram.
 It is used to represent conditional dependencies.
 It can also be used in various tasks including prediction, anomaly
detection, diagnostics, automated insight, reasoning, time series
prediction, and decision making under uncertainty.
 A Bayesian network graph is made up of nodes and Arcs (directed links).

Figure 2.1 – Example for Bayesian Network

 Each node corresponds to the random variables, and a variable can be
continuous or discrete.

17
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

 Arc or directed arrows represent the causal relationship or conditional

probabilities between random variables.
 These directed links or arrows connect the pair of nodes in the graph.
 These links represent that one node directly influence the other node, and
if there is no directed link that means that nodes are independent with
each other.
Example
In the figure 2.1, A, B, C, and D are random variables represented by
the nodes of the network graph.
 Considering node B, which is connected with node A by a directed
arrow, then node A is called the parent of Node B.
 Node C is independent of node A.
 The Bayesian network graph does not contain any cyclic graph. Hence,
it is known as a directed acyclic graph or DAG.
 The Bayesian network has mainly two components:
1. Causal Component
2. Actual numbers

 Each node in the Bayesian network has condition probability

distribution P(Xi |Parent(Xi) ), which determines the effect of the
parent on that node.
 Bayesian network is based on Joint probability distribution and
conditional probability.

3.2 Joint probability distribution:

3.3 Constructing Bayesian Network

18
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

Global Semantics

Local Semantics

Markov Blanket
 Each node is conditionally independent of all others given its
Markov blanket: parents + children + children’s parents

3.4 Example:
Harry installed a new burglar alarm at his home to detect burglary.
The alarm reliably responds at detecting a burglary but also responds
for minor earthquakes. Harry has two neighbors David and Sophia,
who have taken a responsibility to inform Harry at work when they
hear the alarm. David always calls Harry when he hears the alarm,
but sometimes he got confused with the phone ringing and calls at
that time too. On the other hand, Sophia likes to listen to high music,
so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

Problem:
Calculate the probability that alarm has sounded, but there is
neither a burglary, nor an earthquake occurred, and David and
Sophia both called the Harry.

19
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

Solution:
 The Bayesian network for the above problem is given in figure 2.2.
The network structure is showing that burglary and earthquake is
the parent node of the alarm and directly affecting the probability
of alarm's going off, but David and Sophia's calls depend on alarm
probability.

Variables:
Burglar, Earthquake, Alarm, John Calls, Mary Calls

Network topology reflects “causal” knowledge

 A burglar can set the alarm off
 An earthquake can set the alarm off
 The alarm can cause Mary to call
 The alarm can cause John to call

Figure 2.2 - The Bayesian network for the example problem

20
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

All events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)

Write the events of problem statement in the form of probability:

P[D, S, A, B, E],

Rewrite the probability statement using joint probability distribution:

Let's take the observed probability for the Burglary and earthquake
component:
 P(B=True) = 0.002, which is the probability of burglary.
 P(B=False)= 0.998, which is the probability of no burglary.
 P(E=True)= 0.001, which is the probability of a minor earthquake
 P(E=False)= 0.999, Which is the probability that an earthquake not
occurred.

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and
earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the
probability of Alarm.

21
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on
its Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, the problem statement in the form of
probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain
by using Joint distribution.

3.5 The semantics of Bayesian Network:

There are two ways to understand the semantics of the Bayesian network,
which is given below:
1. To understand the network as the representation of the Joint
probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of
conditional independence statements.
It is helpful in designing inference procedure.

3.6 Applications of Bayesian networks in AI

Bayesian networks find applications in a variety of tasks such as:
1. Spam filtering:
a. A spam filter is a program that helps in detecting unsolicited and
spam mails. Bayesian spam filters check whether a mail is spam
or not.

22
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

2. Biomonitoring:
a. This involves the use of indicators to quantify the concentration of
chemicals in the human body.
3. Information retrieval:
a. Bayesian networks assist in information retrieval for research,
which is a constant process of extracting information from
databases.
4. Image processing:
a. A form of signal processing, image processing uses mathematical
operations to convert images into digital format.
5. Gene regulatory network:
a. A Bayesian network is an algorithm that can be applied to gene
regulatory networks in order to make predictions about the effects
of genetic variations on cellular phenotypes.
b. Gene regulatory networks are a set of mathematical equations that
describe the interactions between genes, proteins, and metabolites.
c. They are used to study how genetic variations affect the
development of a cell or organism.
6. Turbo code:
a. Turbo codes are a type of error correction code capable of achieving
very high data rates and long distances between error correcting
nodes in a communications system.
b. They have been used in satellites, space probes, deep-space
missions, military communications systems, and civilian wireless
communication systems, including WiFi and 4G LTE cellular
telephone systems.
7. Document classification:
a. The main issue is to assign a document multiple classes. The task
can be achieved manually and algorithmically. Since manual effort
takes too much time, algorithmic documentation is done to
complete it quickly and effectively.

4. Explain in detail about Bayesian Inference and its type Exact Inference
with suitable example.

Exact inference in Bayesian networks

The basic task for any probabilistic inference system is to compute the
posterior probability distribution for a set of query variables, given some observed
event-that is, some assignment of values to a set of evidence variables. We will use

23
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

the notation X denotes the query variable; E denotes the set of evidence variables
El, . . . , Em, and e is a particular observed event; Y will denote the nonevidence
variables Yl, . . . , (sometimes called the hidden variables). Thus, the complete set
of variables X = {X} U E U Y. A typical query asks for the posterior probability
distribution P(X|e)4

In the burglary network, we might observe the event in which JohnCalls =

true and MaryCalls = true. We could then ask for, say, the probability that a
burglary has occurred:

Inference by enumeration

Conditional probability can be computed by summing terms from the full joint
distribution. More specifically, a query P(X|e) can be answered using Equation,
which we repeat here for convenience:

Now, a Bayesian network gives a complete representation of the full joint

distribution. More specifically, Equation shows that the terms P(x, e, y) in the joint
distribution can be written as products of conditional probabilities from the
network. Therefore, a query can be answered using a Bayesian network by
computing sums of products of conditional probabibities from the network. In Figure
an algorithm, ENUMERATE-JOINT-ASK, was given for inference by enumeration
from the full joint distribution. The algorithm takes as input a full joint distribution
P and looks up values therein. It is a simple matter to modify the algorithm so that
it takes as input a Bayesian network bn and "looks up" joint entries by multiplying
the corresponding CPT entries from bn.

Consider the query P(Burglary1 JohnCalls = true, &Jury Calls = true). The
hidden variables for this query are Earthquake and Alarm. From Equation (13.6),
using initial letters for the variables in order to shorten the expressions, we have

The semantics of Bayesian networks then gives us an expression in terms of

CPT entries. For simplicity, we will do this just for Burglizry = true:

24
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

To compute this expression, we have to add four terms, each computed by

multiplying five numbers. In the worst case, where we have to sum out almost all
the variables, the complexity of the algorithm for a network with n Boolean
variables is O(n2n). An improvement can be obtained from the following simple
observations: the P(b) term is a constant and can be moved outside the
summaltions over a and e, and the 13(e) term can be moved outside the summation
over a. Hence, we have

This expression can be evaluated by looping through the variables in order,

multiplying CPT entries as we go. For each summation, we also need to loop over
the variable's possible values. The structure of this computation is shown in Figure.
Using the numbers from Figure, we obtain P(b| j , m) = a x 0.00059224. 'The
correspondingc omputation for ~b yields a x 0.0014919; hence

This expression can be evaluated by looping through the variables in order,

multiplying CPT entries as we go. For each summation, we also need to loop over
the variable's possible values.

The structure of this computation is shown in above Figure. Using the numbers
from Figure, we obtain P(b| j , m) = a x 0.00059224. 'The co~respondingc
omputation for ~b yields a x 0.0014919; hence

25
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

That is, the chance of a burglary, given calls from both neighbors, is about 28%.
The evaluation process for the expression in Equation is shown as an expression
tree in Figure.

The variable elimination algorithm

The enumeration algorithm can be improved substantially by eliminating
repeated calculations of the kind illustrated in Figure. The idea is simple: do the
calculation once and save the results for later use. This is a form of dynamic
programming. There are several versions of this approach; we present the variable
elimination algorithm, which is the simplest. Variable elimination works by
evaluating expressions such as Equation in right-to-left order (that is, bottom-up in
Figure). Intermediate results are stored, and summations over each variable are
done only for those portions of the expression that depend on the variable. Let us
illustrate this process for the burglary network. We evaluate the expression

The complexity of exact inference

We have argued that variable elimination is more efficient than enumeration

because it avoids repeated computations (as well as dropping irrelevant variables).
The time and space requirements of variable elimination are dominated by the size
of the largest factor constructed during the operation of the algorithm. This in turn
is determined by the order of elimination of variables and by the structure of the
network.

The burglary network of Figure belongs to the family of networks in which

there is at most one undirected path between any two nodes in the network. These
are called singly connected networks or polytrees, and they have a particularly
nice property: The time and space complexity of exact inference in polytrees is linear
in the size of the network. I-Iere, the size is defined as the number of CPT entries; if

26
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

the number of parents of each node is bounded by a constant, then the complexity
will also be linear in the number of nodes. These results hold for any ordering
consistent with the topological ordering of the network .

For multiply connected networks, such as that of Figure, variable

elimination can have exponential time and space complexity in the worst case, even
when the number of parents per node is bounded. This is not surprising when one
considers that, because it includes inference in propositional logic as a special case,
inference in Bayesian networks is 1W-hard. In fact, it can be shown that the
problem is as hard as that of computing the number of satisfying assignments for a
propositional logic formula. This means that it is #P-hard ("number-P hard")-that is,
strictly harder than NP-complete problems.

There is a close connection between the complexity of Bayesian network

inference and the complexity of constraint satisfaction problems (CSPs), the
difficulty of solving a discrete CSP is related to how "tree-lilce" its constraint graph
is Measures such as hypertree width, which bound the complexity of solving a
CSP, can also be applied directly to Bayesian networks. Moreover, the variable
elimination algorithm can be generalized to solve CSPs as well as Bayesian
networks.

5. Explain Causal Network or Causal Bayesian Network in Machine

5.1 Causal Network or Causal Bayesian Network
 A causal network is an acyclic digraph arising from an evolution of
a substitution system, and representing its history.
 In an evolution of a multiway system, each substitution event is a vertex
in a causal network.
 Two events which are related by causal dependence, meaning one occurs
just before the other, have an edge between the corresponding vertices in
the causal network.
 More precisely, the edge is a directed edge leading from the past event to
the future event.
 Refer Figure 2.3 for an example causal network.
 A CBN is a graph formed by nodes representing random variables,
connected by links denoting causal influence.

27
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

Figure 2.3 – Causal Network Example

 Some causal networks are independent of the choice of evolution, and

these are called causally invariant.

 Structural Causal Models (SCMs).

 A network with 2 nodes (fire icon and smoke icon) and 1 edge (arrow
pointing from fire to smoke).
 This network can be both a Bayesian or causal network.
 The key distinction, however, is when interpreting this network.
 For a Bayesian network, we view the nodes as variables and the arrow
as a conditional probability, namely the probability of smoke given
information about fire.
 When interpreting this as a causal network, we still view nodes as
variables, however, the arrow indicates a causal connection.

28
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

 In this case, both interpretations are valid. However, if we were to flip the
edge direction, the causal network interpretation would be invalid, since
smoke does not cause fire.

Implementing Causal Inference

1.The do-operator
 The do-operator is a mathematical representation of a physical
intervention.
 If the model starts with Z → X → Y, simulate an intervention in X by
deleting all the incoming arrows to X, and manually setting X to some
value x_0. Refer Figure 2.4 denotes the example of do-operator.

Figure 2.4 – do-operator Example

P(Y|X) is the conditional probability that is, the probability of Y given

an observation of X. While, P(Y|do(X)) is the probability of Y given
an intervention in X.

29
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

2: Confounding
A simple example of confounding is shown in the figure 2.5 below.

Figure 2.5 – Confounding Example

 In this example, age is a confounder of education and wealth. In

other words, if trying to evaluate the impact of education on wealth
one would need to adjust for age.
 Adjusting for (or conditioning on) age just means that when
looking at age, education, and wealth data, one would compare
data points within age groups, not between age groups.
 Confounding is anything that leads to P(Y|X) being different than
P(Y|do(X)).

3: Estimating Causal Effects

 Treatment effect = (Outcome under E) minus (Outcome under C),
that is the difference between the outcome a child would receive if
assigned to treatment E and the outcome that same child would
receive if assigned to treatment C. These are called potential
outcomes.

6. Explain approximate inference in Bayesian network (BN)

Given the intractability of exact inference in large networks, we will now

consider approximate inference methods. This section describes randomized
sampling algorithms, also called Monte Carlo algorithms, that provide approximate
answers whose accuracy depends on the number of samples generated.

They work by generating random events based on the probabilities in the

Bayes net and counting up the different answers found in those random events.
With enough samples, we can get arbitrarily close to recovering the true probability
distribution—provided the Bayes net has no deterministic conditional distributions

30
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

Direct sampling methods

The primitive element in any sampling algorithm is the generation of samples
from a known probability distribution. For example, an unbiased coin can be
thought of as a random variable Coin with values (heads, tails) and a prior
distribution P(Coin) = (0.5,0.5). Sampling from this distribution is exactly like
flipping the coin: with probability 0.5 it will return heads, and with probability 0.5
it will return tails.
Given a source of random numbers r uniformly distributed in the range [0,1],
it is a simple matter to sample any distribution on a single variable, whether
discrete or continuous. This is done by constructing the cumulative distribution for
the variable and returning the first value whose cumulative probability exceeds r

We begin with a random sampling process for a Bayes net that has no evidence
associated with it. The idea is to sample each variable in turn, in topological order.
The probability distribution from which the value is sampled is conditioned on the
values already assigned to the variable’s parents. (Because we sample in topological
order, the parents are guaranteed to have values already.) This algorithm is shown
in Figure. Applying it to the network with the ordering Cloudy, Sprinkler, Rain,
WetGrass, we might produce a random event as follows:

Rejection sampling in Bayesian networks

Rejection sampling is a general method for producing samples from a hard-
to-sample distribution given an easy-to-sample distribution. In its simplest form, it
can be used to compute conditional probabilities that is, to determine P(X |e). The
REJECTION-SAMPLING algorithm is shown in Figure. First, it generates samples
from the prior distribution specified by the network. Then, it rejects all those that
do not match the evidence. Finally, the estimateˆP (X =x|e) is obtained by counting
how often X =x occurs in the remaining samples.

31
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

Let ˆP(X |e) be the estimated distribution that the algorithm returns; this
distribution is computed by normalizing NPS(X,e), the vector of sample counts for
each value of X where the sample agrees with the evidence e:

Inference by Markov chain simulation

In this section, we describe the Markov chain Monte Carlo (MCMC)
algorithm for inference in Bayesian networks. We will first describe what the
algorithm does, then we will explain why it works and why it has such a
complicated name.

The MCMC algorithm

MCMC generates each event by making a random change to the preceding

event. It is therefore helpful to think of the network as being in a particular current
state specifying a value for every variable. The next state is generated by randomly
sampling a value for one of the nonevidence variables Xi,conditioned on the current
values of the variables in the Markov blanket of Xi. MCMC therefore wanders
randomly around the state space-the space of possible complete assignments-
flipping one variable at a time, but keeping the evidence variables fixed.

Consider the query P(Rain1 Sprinkler = true, Wet Grass = true) applied to the
network in Figure. The evidence variables Sprinkler and WetGrass are fixed to their
observed values and the hidden variables Cloudy and Rain are initialized randomly-
let us say to true and false respectively. Thus, the initial state is [true, true, false,
true]. Now the following steps are executed repeatedly:

32
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 2

1. Cloudy is sampled, given the current values of its Markov blanket variables: in
this case, we sample from P(Cloudy1 Sprinkler = true, Rain =false). Suppose the
result is Cloudy =false. Then the new current state is [false, true, false, true].

2. Rain is sampled, given the current values of its Markov blanket variables: in this
case, we sample from P(Rain1 Cloudy =false, Sprinkler = true, WetGrass = true).
Suppose this yields Rain = true. The new current state is [false, true, true, true].
Each state visited during this process is a sample that contributes to the estimate
for the query variable Rain. If the process visits 20 states where Rain is true and 60
states where Rain is false, then the answer to the query is NORMALIZE

33
PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE
CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

MAILAM ENGINEERING COLLEGE

Mailam – 604 304
(Approved by AICTE-New Delhi, Affiliated to Anna University-Chennai,
Accredited by National Board of Accreditation-New Delhi & TCS-Chennai)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
II YEAR / IV SEM
CS3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
UNIT III SUPERVISED LEARNING

SYLLABUS:
Introduction to machine learning – Linear Regression Models: Least
squares, single & multiple variables, Bayesian linear regression,
gradient descent, Linear Classification Models: Discriminant function –
Probabilistic discriminative model - Logistic regression, Probabilistic
generative model – Naive Bayes, Maximum margin classifier – Support
vector machine, Decision Tree, Random forests
PART A
1. Define Machine Learning.
 Arthur Samuel, an early American leader in the field of computer gaming
and artificial intelligence, coined the term “Machine Learning ” in 1959
while at IBM.
 He defined machine learning as “the field of study that gives computers
the ability to learn without being explicitly programmed “.
 Machine learning is programming computers to optimize a performance
criterion using example data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain
knowledge from data.

2 Mention the various classification of Machine Learning

3. Define Supervised learning

 Supervised learning is the machine learning task of learning a function
that maps an input to an output based on example input-output pairs.
 The given data is labeled.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 1

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 Both classification and regression problems are supervised learning

problems.

5. Define Unsupervised learning

 Unsupervised learning is a type of machine learning algorithm used to draw
inferences from datasets consisting of input data without labeled responses.
 In unsupervised learning algorithms, classification or categorization is not
included in the observations.
 In unsupervised learning the agent learns patterns in the input without any
explicit feedback.
 The most common unsupervised learning task is clustering: detecting
potentially useful clusters of input examples.

6. What is Reinforcement learning?

 In reinforcement learning the agent learns from a series of reinforcements:
rewards and punishments.
 Reinforcement learning is the problem of getting an agent to act in the world
so as to maximize its rewards.
 A learner is not told what actions to take as in most forms of machine
learning but instead must discover which actions yield the most reward by
trying them.

7. What is Semi-supervised learning?

 Semi-Supervised learning is a type of Machine Learning algorithm that
represents the intermediate ground between Supervised and Unsupervised
learning algorithms.
 It uses the combination of labeled and unlabeled datasets during the training
period, where an incomplete training signal is given: a training set with some
of the target outputs missing.

8. How to Categorize algorithm based on required Output?

 Classification
 Regression
 Clustering

9. Define Classification.
 The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of training
data.
 In Classification, a program learns from the given dataset or observations
and then classifies new observation into a number of classes or groups.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 2

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can
be called as targets/labels or categories.

10. Define Regression.

 Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables.
 It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.

11. Define Clustering.

 Clustering or cluster analysis is a machine learning technique, which
groups the unlabeled dataset.
 It can be defined as "A way of grouping the data points into different
clusters, consisting of similar data points.
 The objects with the possible similarities remain in a group that has
less or no similarities with another group."

12. What is Linear Regression?

 In statistics, linear regression is a linear approach to modeling the
relationship between a dependent variable and one or more independent
variables.
 Let X be the independent variable and Y be the dependent variable.
 A linear relationship between these two variables as follows:

Where,
m: Slope
c: y-intercept

13. What is Least Squares Regression Line?

14. Narrate Least Squares Regression Equation

 The equation that minimizes the total of all squared prediction
errors for known Y scores in the original correlation analysis.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 3

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

where
Y´ represents the predicted value;
X represents the known value;
b and a represent numbers calculated from the original correlation
analysis

15. List and define the types of Linear Regression.

It is of two types: Simple and Multiple.
 Simple Linear Regression is where only one independent variable is
present and the model has to find the linear relationship of it with the
dependent variable
 Equation of Simple Linear Regression, where bo is the intercept, b1 is
coefficient or slope, x is the independent variable and y is the dependent
variable.

o In Multiple Linear Regression there are more than one independent

variables for the model to find the relationship.
Equation of Multiple Linear Regression, where bo is the intercept,
b1,b2,b3,b4…,bn are coefficients or slopes of the independent variables
x1,x2,x3,x4…,xn and y is the dependent variable.

16.Define Linear Regression Model.

 A Linear Regression model’s main aim is to find the best fit linear line and
the optimal values of intercept and coefficients such that the error is
minimized.

17. What is error or residual?

 Error is the difference between the actual value and Predicted value and the
goal is to reduce this difference.
 The vertical distance between the data point and the regression line is known
as error or residual.
 Each data point has one residual and the sum of all the differences is known
as the Sum of Residuals/Errors.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 4

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

18. Define Bayesian Regression.

 Bayesian Regression is used when the data is insufficient in the dataset or
the data is poorly distributed.
 The output of a Bayesian Regression model is obtained from a probability
distribution.
 The aim of Bayesian Linear Regression is to find the ‘posterior‘
distribution for the model parameters.
 The expression for Posterior is :

where
o Posterior: It is the probability of an event to occur; say, H, given that
another event; say, E has already occurred. i.e., P(H | E).
o Prior: It is the probability of an event H has occurred prior to another
event. i.e., P(H)
o Likelihood: It is a likelihood function in which some parameter variable
is marginalized.

19. List the Advantages and Disadvantages of Bayesian Regression.

 Very effective when the size of the dataset is small.
 Particularly well-suited for on-line based learning (data is received in
real-time), as compared to batch based learning, where we have the
entire dataset on our hands before we start training the model. This is
because Bayesian Regression doesn’t need to store data.
 The Bayesian approach is a tried and tested approach and is very
robust, mathematically. So, one can use this without having any extra
prior knowledge about the dataset.
Disadvantages of Bayesian Regression:
 The inference of the model can be time-consuming.
 If there is a large amount of data available for our dataset, the Bayesian
approach is not worth it.

20. What are the two types of Classifications problem?

 Two-class problems :
o Binary representation or Binary Classifier:
o If the classification problem has only two possible outcomes, then
it is called as Binary Classifier.
o There is a single target variable t ∈ {0, 1}
 t = 1 represents class C1
 t = 0 represents class C2

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 5

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

o Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM,

CAT or DOG, etc.

o Multi-class Problems:
o If a classification problem has more than two outcomes, then it is
called as Multi-class Classifier.
o Example: Classifications of types of crops, Classification of types
of music.

21. List the different Types of ML Classification Algorithms.

 Logistic Regression
 K-Nearest Neighbors
 Support Vector Machines
 Kernel SVM
 Naïve Bayes
 Decision Tree Classification
 Random Forest Classification

22. What is Discriminant function?

 A function of a set of variables that is evaluated for samples of events or
objects and used as an aid in discriminating between or classifying them.
 A discriminant function (DF) maps independent (discriminating) variables
into a latent variable D.
 DF is usually postulated to be a linear function:
D = a0 + a1 x1 + a2 x2 ... aNxN

22 Define Probabilistic discriminative models.

 Discriminative models are a class of supervised machine learning
models which make predictions by estimating conditional
probability P(y|x).
 For the two-class classification problem, the posterior probability of class
C1 can be written as a logistic sigmoid acting on a linear function of x

 For the multi-class case, the posterior probability of class Ck is given by a

softmax transformation of a linear function of x

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 6

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

23. Define Logistics Regression

 Logistic regression is the Machine Learning algorithms, under the
classification algorithm of Supervised Learning technique .
 Logistic regression is used to describe data and the relationship between
one dependent variable and one or more independent variables.
 The independent variables can be nominal, ordinal, or of interval type.
 Logistic regression predicts the output of a categorical dependent
variable.
 Therefore the outcome must be a categorical or discrete value.

24. DefineLogistic Function or Sigmoid Function.

 The logistic function is also known as the sigmoid function.
 The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
 The value of the logistic regression must be between 0 and 1, so it forms
a curve like the "S" form.
 The S-form curve is called the Sigmoid function or the logistic function.

25. List the types of Logistic Regression.

 Logistic Regression can be classified into three types:
o Binomial: In binomial Logistic regression, there can be only two
possible types of the dependent variables, such as 0 or 1, Pass or Fail,
etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or
more possible unordered types of the dependent variable, such as
"cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more
possible ordered types of dependent variables, such as "low",
"Medium", or "High".

26. What are the Steps in Logistic Regression?

 To implement the Logistic Regression using Python, the steps are given
below:
 Data Pre-processing step
 Fitting Logistic Regression to the Training set
 Predicting the test result
 Test accuracy of the result
 Visualizing the test set result.

27. List the advantages of Logistic Regression Algorithm.

 Logistic regression performs better when the data is linearly separable

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 7

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 It does not require too many computational resources

 There is no problem scaling the input features
 It is easy to implement and train a model using logistic regression

28. Define Probabilistic Generative model

 Given a model of one conditional probability, and estimated probability
distributions for the variables X and Y, denoted P(X) and P(Y), can
estimate the conditional probability using Bayes' rule:

 A generative model is a statistical model of the joint probability

distribution on given observable variable X and target variable Y.
Given a generative model for P(X|Y), can estimate:

29. Define Discriminative model.

 A discriminative model is a model of the conditional probability of the
target Y, given an observation x given a discriminative model for
P(Y|X), can estimate:

 Classifier based on a generative model is a generative classifier, while a

classifier based on a discriminative model is a discriminative classifier

30. List the types of Generative models.

 Types of generative models are:
 Naive Bayes classifier or Bayesian network
 Linear discriminant analysis

31. Mention the algorithms in Discriminative models.

 Logistic regression
 Support Vector Machines
 Decision Tree Learning
 Random Forest

32. Define Support Vector Machine (SVM)

 Support Vector Machine(SVM) is a supervised machine learning
algorithm used for both classification and regression.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 8

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 The objective of SVM algorithm is to find a hyperplane in an N-

dimensional space that distinctly classifies the data points.
 Hyperplanes are decision boundaries that help classify the data points.

33. Define Hinge loss function

 The cost is 0 if the predicted value and the actual value are of the same
sign. If they are not, then calculate the loss value.

34. Define SVM Kernel.

 The SVM kernel is a function that takes low dimensional input space
and transforms it into higher-dimensional space, ie it converts non
separable problem to separable problem. It is mostly useful in non-
linear separation problems.

35. List the types of SVMs

 There are two different types of SVMs, each used for different things:
o Simple SVM: Typically used for linear regression and classification
problems.
o Kernel SVM: Has more flexibility for non-linear data .

36. What are the advantages and disadvantages of SVM?

Advantages
 Effective on datasets with multiple features, like financial or medical
data.
 Effective in cases where number of features is greater than the number
of data points.
 Its memory efficient as it uses a subset of training points in the decision
function called support vectors
 Different kernel functions can be specified for the decision functions and
its possible to specify custom kernels
Disadvantages
 If the number of features is a lot bigger than the number of data points,
choosing kernel functions and regularization term is crucial.
 SVMs don't directly provide probability estimates. Those are calculated
using an expensive five-fold cross-validation.
 Works best on small sample sets because of its high training time.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 9

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

37. List the applications of SVM.

SVMs can be used to solve various real-world problems:
 SVMs are helpful in text and hypertext categorization.
 Classification of images can also be performed using SVMs.
 Classification of satellite data like SAR data using supervised SVM.
 Hand-written characters can be recognized using SVM.
 The SVM algorithm has been widely applied in the biological and other
sciences. They have been used to classify proteins with up to 90% of the
compounds classified correctly.

38. Define Decision Tree

 Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems.
 It is a tree-structured classifier, where internal nodes represent the
features of a dataset, branches represent the decision rules and each leaf
node represents the outcome.
 In a Decision tree, there are two nodes, the Decision Node and Leaf
Node.

39. List the types of Decision Trees

1. Categorical Variable Decision Tree: Decision Tree which has a categorical
target variable then it called a Categorical variable decision tree.
2. Continuous Variable Decision Tree: Decision Tree has a continuous target
variable then it is called Continuous Variable Decision Tree.

40. Mention the reason for using Decision Trees

 Decision Trees usually mimic human thinking ability while making a
decision, so it is easy to understand.
 The logic behind the decision tree can be easily understood because it
shows a tree-like structure.

41. List the algorithms used to construct Decision Trees:

 ID3 → (extension of D3)
 C4.5 → (successor of ID3)
 CART → (Classification And Regression Tree)
 CHAID → (Chi-square automatic interaction detection Performs multi-level
splits when computing classification trees)
 MARS → (multivariate adaptive regression splines)

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 10

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

PART B

1. Define Machine Learning. Give an introduction to Machine Learning.

INTRODUCTION TO MACHINE LEARNING

1.1 Machine Learning
1.1.1 Definition of Machine Learning
1.1.2 Definition of learning
1.1.3 Examples
1.1.3.1 Handwriting recognition learning problem
1.1.3.2 A robot driving learning problem
1.2 Classification of Machine Learning
1.2.1 Supervised learning
1.2.2 Unsupervised learning
1.2.3 Reinforcement learning
1.2.4 Semi-supervised learning
1.3 Categorizing based on required Output
1.3.1 Classification
1.3.2 Regression
1.3.3 Clustering

1.1 Machine Learning:

1.1.1 Definition of Machine Learning:
 Arthur Samuel, an early American leader in the field of
computer gaming and artificial intelligence, coined the term
“Machine Learning ” in 1959 while at IBM.
 He defined machine learning as “the field of study that gives
computers the ability to learn without being explicitly
programmed “.
 Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
The model may be predictive to make predictions in the future,
or descriptive to gain knowledge from data.

1.1.2 Definition of learning:

 A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P , if
its performance at tasks T, as measured by P , improves with
experience E.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 11

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

1.1.3 Examples
1.1.3.1Handwriting recognition learning problem
 Task T : Recognizing and classifying handwritten words
within images
 Performance P : Percent of words correctly classified
 Training experience E : A dataset of handwritten words with
given classifications
1.1.3.2A robot driving learning problem
 Task T : Driving on highways using vision sensors
 Performance P : Average distance traveled before an error
 Training experience E : A sequence of images and steering
commands recorded while observing a human driver

1.2 Classification of Machine Learning

 Machine learning implementations are classified into four major
categories, depending on the nature of the learning “signal” or
“response” available to a learning system which are as follows:
1.2.1 Supervised learning:
 Supervised learning is the machine learning task of learning a
function that maps an input to an output based on example
input-output pairs.
 The given data is labeled.
 Both classification and regression problems are supervised
learning problems.
 For example, the inputs could be camera images, each one
accompanied by an output saying “bus” or “pedestrian,” etc.
 An output like this is calleda label.
 The agent learns a function that, when given a new image,
predicts theappropriate label.

1.2.2 Unsupervised learning:

 Unsupervised learning is a type of machine learning algorithm
used to draw inferences from datasets consisting of input data
without labeled responses.
 In unsupervised learning algorithms, classification or
categorization is not included in the observations.
 In unsupervised learning the agent learns patterns in the input
without any explicit feedback.
 The most common unsupervised learning task is clustering:
detecting potentially useful clusters of input examples.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 12

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 For example, when shown millions of images taken from the

Internet, a computer vision system can identify a large cluster
of similar images which an English speaker would call “cats.”

1.2.3 Reinforcement learning:

 In reinforcement learning the agent learns from a series of
reinforcements: rewards and punishments.
 Reinforcement learning is the problem of getting an agent to act
in the world so as to maximize its rewards.
 A learner is not told what actions to take as in most forms of
machine learning but instead must discover which actions yield
the most reward by trying them.
 For example — Consider teaching a dog a new trick: we cannot
tell him what to do, what not to do, but we can reward/punish
it if it does the right/wrong thing.

1.2.4 Semi-supervised learning:

 Semi-Supervised learning is a type of Machine Learning
algorithm that represents the intermediate ground between
Supervised and Unsupervised learning algorithms.
 It uses the combination of labeled and unlabeled datasets
during the training period, where an incomplete training signal
is given: a training set with some of the target outputs missing.

1.3 Categorizing based on required Output

1.3.1 Classification:
 The Classification algorithm is a Supervised Learning technique
that is used to identify the category of new observations on the
basis of training data.
 In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number
of classes or groups.
 Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
Classes can be called as targets/labels or categories.

1.3.1 Regression:
 Regression is a supervised learning technique which helps in
finding the correlation between variables and enables us to
predict the continuous output variable based on the one or more
predictor variables.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 13

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 It is mainly used for prediction, forecasting, time series

modeling, and determining the causal-effect relationship
between variables.

1.3.2 Clustering:
 Clustering or cluster analysis is a machine learning technique,
which groups the unlabeled dataset.
 It can be defined as "A way of grouping the data points into
different clusters, consisting of similar data points.
 The objects with the possible similarities remain in a group that
has less or no similarities with another group."

2. Explain in detail about Linear Regression Models. Or Explain Linear

Regression Models: Least squares, single & multiple variables, Bayesian
linear regression, gradient descent.

LINEAR REGRESSION MODELS

2.1 Linear Regression
2.2 Least Squares Regression Line
2.2.1 Least Squares Regression Equation
2.2.2 Least Squares Regression in Python
2.3 Types of Linear Regression
2.4 Linear Regression Model
2.5 Bayesian Regression
2.5.1 Implementation Of Bayesian Regression Using Python
2.6 Gradient descent
2.6.1 Cost Function
2.6.2 Gradient Descent Algorithm.

2.1 Linear Regression

 In statistics, linear regression is a linear approach to modeling the
relationship between a dependent variable and one or more
independent variables.
 Let X be the independent variable and Y be the dependent variable.
 A linear relationship between these two variables as follows:

Where,
m: Slope
c: y-intercept

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 14

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 Linear regression algorithm shows a linear relationship between a

dependent (y) and one or more independent (x) variables, hence called
as linear regression.
 Linear regression finds how the value of the dependent variable is
changing according to the value of the independent variable.
 The linear regression model provides a sloped straight line
representing the relationship between the variables.
 Consider the below Figure 3.1, which represents the relationship
between independent and dependent variables

Figure 3.1 – Relationship between independent and dependent

variables

2.2 Least Squares Regression Line

 Least squares are a commonly used method in regression analysis
for estimating the unknown parameters by creating a model which
will minimize the sum of squared errors between the observed data
and the predicted data.
2.2.1 Least Squares Regression Equation
 The equation that minimizes the total of all squared prediction
errors for known Y scores in the original correlation analysis.

where
Y´ represents the predicted value;
X represents the known value;
b and a represent numbers calculated from the original correlation
analysis

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 15

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

2.2.2 Least Squares Regression in Python

Scenario: A rocket motor is manufactured by combining an igniter
propellant and a sustainer propellant inside a strong metal housing. It
was noticed that the shear strength of the bond between two propellers
is strongly dependent on the age of the sustainer propellant.
Problem Statement: Implement a simple linear regression algorithm
using Python to build a machine learning model that studies the
relationship between the shear strength of the bond between two
propellers and the age of the sustainer propellant.
Step 1: Import the required Python libraries.
# Importing Libraries
importnumpyasnp
import pandas aspd
importmatplotlib.pyplotasplt

Step 2: Next step is to read and load the dataset.

# Loading dataset
data = pd.read_csv('PropallantAge.csv')
data.head()
data.info()

Step 3: Create a scatter plot just to check the relationship between

these two variables.
# Plotting the data
plt.scatter(data['Age of Propellant'],data['Shear
Strength'])

Step 4: Next step is to assign X and Y as independent and dependent

variables respectively.
# Computing X and Y
X = data['Age of Propellant'].values
Y = data['Shear Strength'].values

Step 5: Compute the mean of variables X and Y to determine the values

of slope (m) and y-intercept.
Also, let n be the total number of data points.
# Mean of variables X and Y
mean_x = np.mean(X)
mean_y = np.mean(Y)
# Total number of data values
n = len(X)

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 16

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Step 6: Calculate the slope and the y-intercept using the formulas
# Calculating 'm' and 'c'
num = 0
denom = 0
for i in range(n):
num += (X[i] - mean_x) * (Y[i] - mean_y)
denom += (X[i] - mean_x) ** 2
m = num / denom
c = mean_y - (m * mean_x)

# Printing coefficients
print("Coefficients")
print(m, c)

The above step has given the values of m and c.

Substituting them ,
Shear Strength =
2627.822359001296 + (-37.15359094490524)
* Age of Propellant

Step 7: The above equation represents the linear regression model.

Let’s plot this graphically. Refer fig 3.2
# Plotting Values and Regression Line
maxx_x = np.max(X) + 10
minn_x = np.min(X) - 10

# line values for x and y

x = np.linspace(minn_x, maxx_x, 1000)
y=c+m*x
# Plotting Regression Line
plt.plot(x, y, color='#58b970', label='Regression Line')
# Plotting Scatter Points
plt.scatter(X, Y, c='#ef5423', label='Scatter Plot')
plt.xlabel('Age of Propellant (in years)')
plt.ylabel('Shear Strength')
plt.legend()
plt.show()

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 17

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Output:

Figure 3.2 – Example for Regression Line

2.3 Types of Linear Regression
It is of two types: Simple and Multiple.
o Simple Linear Regression is where only one independent variable is
present and the model has to find the linear relationship of it with the
dependent variable
Equation of Simple Linear Regression, where bo is the intercept, b1 is
coefficient or slope, x is the independent variable and y is the dependent
variable.

o In Multiple Linear Regression there are more than one independent

2.4 Linear Regression Model

 A Linear Regression model’s main aim is to find the best fit linear line and
the optimal values of intercept and coefficients such that the error is
minimized.
 Error is the difference between the actual value and Predicted value and
the goal is to reduce this difference.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 18

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Figure 3.3 – Example for Linear Regression Model

 In the above figure 3.3,

 x is our dependent variable which is plotted on the x-axis and y is the
dependent variable which is plotted on the y-axis.
 Black dots are the data points i.e the actual values.
 bo is the intercept which is 10 and b1 is the slope of the x variable.
 The blue line is the best fit line predicted by the model i.e the predicted
values lie on the blue line.
 The vertical distance between the data point and the regression line is
known as error or residual.
 Each data point has one residual and the sum of all the differences is known
as the Sum of Residuals/Errors.

Mathematical Approach:
Residual/Error = Actual values – Predicted Values
Sum of Residuals/Errors = Sum(Actual- Predicted Values)
Square of Sum of Residuals/Errors = (Sum(Actual- Predicted Values))2

2.5 Bayesian Regression

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 19

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 The aim of Bayesian Linear Regression is to find the ‘posterior‘

distribution for the model parameters.
 The expression for Posterior is :

 The Bayesian Ridge Regression formula is as follows:

p(y|λ) = N(w|0, λ^-1Ip)
where
 'y' is the expected value,
 lambda is the distribution's shape parameter before the lambda
parameter
 the vector "w" is made up of the elements w0, w1,....

2.5.1 Implementation Of Bayesian Regression Using Python

 Boston Housing dataset, which includes details on the average price of
homes in various Boston neighborhoods.
 The r2 score will be used for evaluation.
 The crucial components of a Bayesian Ridge Regression model:

Program
fromsklearn.datasets import load_boston
fromsklearn.model_selection import train_test_split
fromsklearn.metrics import r2_score
fromsklearn.linear_model import BayesianRidge

# Loading the dataset

dataset = load_boston()
X, y = dataset.data, dataset.target

# Splitting the dataset into testing and training sets

X_train, X_test, y_train, y_test = train_test_split
(X, y, test_size = 0.15, random_state = 42)

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 20

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

# Creating to train the model

model = BayesianRidge()
model.fit(X_train, y_train)

# Model predicting the test data

prediction = model.predict(X_test)

# Evaluation of r2 score of the model against the test dataset

print(f"Test Set r2 score : {r2_score(y_test, prediction)}")

Output
Test Set r2 score : 0.7943355984883815

Advantages of Bayesian Regression:

Disadvantages of Bayesian Regression:

 The inference of the model can be time-consuming.
 If there is a large amount of data available for our dataset, the Bayesian
approach is not worth it.

2.6 Gradient descent

2.6.1 Cost Function
 The cost is the error in our predicted value.
 It is calculated using the Mean Squared Error function as shown in figure
3.4.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 21

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Figure 3.4 – Example for Cost function

 The goal is to minimize the cost as much as possible in order to find the
best fit line.

2.6.2 Gradient Descent Algorithm.

 Gradient descent is an optimization algorithm that finds the best-fit line
for a given training dataset in a smaller number of iterations.
 If m and c are plotted against MSE, it will acquire a bowl shape as shown
in figure 3.4a and figure 3.4b.

Figure 3.4a – Process of gradient descent algorithm

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 22

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Figure 3.4b – Gradient Descent Shape

Learning Rate
 A learning rate is used for each pair of input and output values. It is a
scalar factor and coefficients are updated in direction towards minimizing
error.
 The process is repeated until a minimum sum squared error is achieved or
no further improvement is possible.

Step by Step Algorithm:

1. Initially, let m = 0, c = 0
Where L = learning rate — controlling how much the value of “m” changes
with each step.
The smaller the L, greater the accuracy. L = 0.001 for a good accuracy.
2. Calculating the partial derivative of loss function “m” to get the
derivative D.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 23

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 Similarly, find the partial derivative with respect to c, Dc..

 Update the current values of m and c using the following equation:

 Repeat this process until our Cost function is very small (ideally 0).

3. Explain in detail about Linear Classification Models – Discriminant

function.

LINEAR CLASSIFICATION MODELS – DISCRIMINANT FUNCTION.

1.1 Linear Classification Models
1.2 Types of ML Classification Algorithms
1.3 Discriminant function

3.1 Linear Classification Models

 The Classification algorithm is a Supervised Learning technique that
is used to identify the category of new observations on the basis of
training data.
 In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam,
cat or dog, etc.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 24

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 Classes can be called as targets/labels or categories.

 The output variable of Classification is a category, not a value, such
as "Green or Blue", "fruit or animal", etc.
 Since the Classification algorithm is a supervised learning
technique, hence it takes labeled input data, which means it
contains input with the corresponding output.
 In classification algorithm, a discrete output function(y) is mapped
to input variable(x).
y=f(x), where y = categorical output

 The best example of an ML classification algorithm is Email Spam

Detector.
 The goal of the classification algorithm is
o Take a D-dimensional input vector x
o Assign it to one of K discrete classes Ck , k = 1, . . . , K
 In the most common scenario, the classes are taken to be disjoint
and each input is assigned to one and only one class
 The input space is divided into decision regions
 The boundaries of the decision regions
o decision boundaries
o decision surfaces
 With linear models for classification, the decision surfaces are linear
functions, Classes that can be separated well by linear surfaces are
linearly separable.
 In the figure 3.5, there are two classes, class A and Class B.
 These classes have features that are similar to each other and
dissimilar to other classes.

Figure 3.5 – Example of Classification

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 25

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 The algorithm which implements the classification on a dataset is

known as a classifier.
 There are two types of Classifications:
o Two-class problems :
o Binary representation or Binary Classifier:
o If the classification problem has only two possible outcomes,
then it is called as Binary Classifier.
o There is a single target variable t ∈ {0, 1}
o t = 1 represents class C1
o t = 0 represents class C2
o Examples: YES or NO, MALE or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.
o Multi-class Problems:
o If a classification problem has more than two outcomes,
then it is called as Multi-class Classifier.
o Example: Classifications of types of crops, Classification of
types of music.
o 1-of-K coding scheme
o There is a K-long target vector t, such that
If the class is Cj, all elements tk of t are zero for k ≠ j and
one for k = j tk is the probability that the class is Ck, K = 6
and Ck = 4, then t = (0, 0, 0, 1, 0, 0)T

 The simplest approach to classification problems is through

construction of a discriminant function that directly assigns each
vector x to a specific class

3.2 Types of ML Classification Algorithms:

 Logistic Regression
 K-Nearest Neighbors
 Support Vector Machines
 Kernel SVM
 Naïve Bayes
 Decision Tree Classification
 Random Forest Classification

1.4 Discriminant function

 A function of a set of variables that is evaluated for samples of events
or objects and used as an aid in discriminating between or classifying
them.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 26

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 A discriminant function (DF) maps independent (discriminating)

variables into a latent variable D.
 DF is usually postulated to be a linear function:
D = a0 + a1 x1 + a2 x2 ... aNxN

 The goal of discriminant analysis is to find such values of the

coefficients {ai, i=0,...,N} that the distance between the mean values of
DF is maximal for the two groups.
 Whenever there is a requirement to separate two or more classes
having multiple features efficiently, the Linear Discriminant Analysis
model is considered the most common technique to solve such
classification problems.
 For example, if there are classes with multiple features and need to
separate them efficiently. Classify them using a single feature, then it
may show overlapping as shown in figure 3.6.

Figure 3.6 – Example for Classification using single feature

 To overcome the overlapping issue in the classification process, must
increase the number of features regularly.

4. Explain in detail about Linear Discriminant Functions and its types. Also
elaborate about logistic regression in detail.

LINEAR DISCRIMINANT FUNCTIONS AND LOGISTIC REGRESSION

4.1 Linear Discriminant Functions
4.2 The Two-Category Case
4.3 The Multi-category Case
4.4 Generalized Linear Discriminant Functions
4.5 Probabilistic discriminative models
4.6 Logistics Regression
4.6.1 Logistic Function (Sigmoid Function)
4.6.2 Assumptions for Logistic Regression
4.6.3 Logistic Regression Equation
4.6.4 Type of Logistic Regression
4.6.5 Steps in Logistic Regression
4.6.6 Advantages of Logistic Regression Algorithm

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 27

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

4.1 Linear Discriminant Functions

A discriminant function that is a linear combination of the
components of x can be written as

(3.1)
where w is the weight vector and w0 the bias or threshold weight.

4.2The Two-Category Case

 For a discriminant function of the form of eq.3.1, a two-category
classifier implements the following decision rule:
 Decide w1 if g(x)>0 and w2 if g(x)<0.
 Thus, x is assigned to w1 if the inner product wTx exceeds the
threshold – w0 and to w2 otherwise.
 If g(x)=0, x can ordinarily be assigned to either class, or can be left
undefined.
 The equation g(x)=0 defines the decision surface that separates
points assigned to w1 from points assigned to w2.
 When g(x) is linear, this decision surface is a hyperplane.
 If x1 and x 2 are both on the decision surface, then

(3.2)
or

(3.3)

Figure 3.7: The linear decision boundary H separates the feature

space into two half-spaces.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 28

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 In figure 3.7, the hyperplane H divides the feature space into two
half-spaces:
o Decisionregion R1 for w1
o region R2 for w2.
 The discriminant function g(x) gives an algebraic measure of the
distance from x to the hyperplane.
 The way to express x as

(3.4)
where x p is the normal projection of x onto H, and r is the desired
algebraic distance which is positive if x is on the positive side and
negative if x is on the negative side. Then, because g(x p)=0,

Since then

(3.5)

(3.6)

o The distance from the origin to H is given by .

o If w0>0, the origin is on the positive side of H, and if w0<0, it is on the
negative side.

o If w0=0, then g(x) has the homogeneous form , and the hyperplane
passes through the origin

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 29

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

4.3 The Multi-category Case

 To devise multi category classifiers employing linear discriminant
functions reduce the problem to c two-class problems.
 Defining c linear discriminant functions

(3.7)

and assigning x to wi if for all j¹ i; in case of ties, the

classification is left undefined.
 The resulting classifier is called a linear machine.

Figure 3.8: Decision boundaries defined by linear machines

 A linear machine divides the feature space into c decision regions as

shown in figure 3.8, with gj(x) being the largest discriminant if x is in
region Ri.
 If Ri and Rj are contiguous, the boundary between them is a portion of
the hyperplane Hij defined by

is normal to Hij and the signed distance from x to Hij is given by

4.4 Generalized Linear Discriminant Functions

 The linear discriminant function g(x) can be written as

(3.8)

where the coefficients wi are the components of the weight vector w.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 30

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Quadratic Discriminant Function

(3.9)

4.5Probabilistic discriminative models

 Discriminative models are a class of supervised machine learning
models which make predictions by estimating conditional
probability P(y|x).
 For the two-class classification problem, the posterior probability of
classC1 can be written as a logistic sigmoid acting on a linear function of
x

 For the multi-class case, the posterior probability of class Ckis given by a
softmax transformation of a linear function of x

4.6 Logistics Regression

o Logistic regression is the Machine Learning algorithms, under the
classification algorithm of Supervised Learning technique.
o Logistic regression is used to describe data and the relationship between one
dependent variable and one or more independent variables.
o The independent variables can be nominal, ordinal, or of interval type.
o Logistic regression predicts the output of a categorical dependent variable.
o Therefore the outcome must be a categorical or discrete value.
o It can be either Yes or No, 0 or 1, true or False, etc. it gives the probabilistic
values which lie between 0 and 1.
o Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
o The figure 3.9 predicts the logistic function

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 31

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Figure 3.9 – Logistic Function or Sigmoid Function

4.6.1 Logistic Function (Sigmoid Function):

o The logistic function is also known as the sigmoid function.
o The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
o The value of the logistic regression must be between 0 and 1, so it forms
a curve like the "S" form.
o The S-form curve is called the Sigmoid function or the logistic function.
4.6.2 Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.
4.6.3 Logistic Regression Equation:
 The Logistic regression equation can be obtained from the Linear
Regression equation.
 The mathematical steps are given below:
 The equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, let's divide the

above equation by (1-y):

o For the range between -[infinity] to +[infinity], take logarithm of the

equation:

The above equation is the final equation for Logistic Regression.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 32

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

4.6.4 Type of Logistic Regression:

4.6.5 Steps in Logistic Regression:

4.6.6 Advantages of Logistic Regression Algorithm

 Logistic regression performs better when the data is linearly separable
 It does not require too many computational resources
 There is no problem scaling the input features
 It is easy to implement and train a model using logistic regression

5. Elaborate in detail about Probabilistic Generative model and Naïve

Bayes.

PROBABILISTIC GENERATIVE MODEL AND NAÏVE BAYES

5.1 Probabilistic Generative model
5.2 Simple example
5.3 Generative models
5.4 Discriminative models

5.1 Probabilistic Generative model

 Given a model of one conditional probability, and estimated probability
distributions for the variables X and Y, denoted P(X) and P(Y), can
estimate the conditional probability using Bayes' rule:

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 33

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 A generative model isa statistical model of the joint probability

distribution on given observable variable X and target variable Y.
Given a generative model for P(X|Y), can estimate:

 A discriminative model is a model of the conditional probability of the

target Y, given an observation xgiven a discriminative model forP(Y|X),
can estimate:

 Classifier based on a generative model is a generative classifier, while a

classifier based on a discriminative model is a discriminative classifier

5.2 Simple example

5.3 Generative models

Types of generative models are:
 Naive Bayes classifier or Bayesian network
 Linear discriminant analysis

5.4 Discriminative models

 Logistic regression
 Support Vector Machines
 Decision Tree Learning
 Random Forest

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 34

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

6. Elaborate in detail about Support Vector Machine (SVM).

SUPPORT VECTOR MACHINE

6.1 Support Vector Machine (SVM)
6.2 Cost Function and Gradient Updates
6.2.1 Hinge loss function
6.3 SVM Kernel
6.4 Types of SVMs
6.5 Advantages of SVM
6.6 Disadvantages
6.7 Applications

6.1 Support Vector Machine (SVM)

 Support Vector Machine(SVM) is a supervised machine learning
algorithm used for both classification and regression.
 The objective of SVM algorithm is to find a hyperplane in an N-
dimensional space that distinctly classifies the data points.
 Hyperplanes are decision boundaries that help classify the data points.
 The dimension of the hyperplane depends upon the number of features.
 If the number of input features is 2, then the hyperplane is just a line.
 If the number of input features is 3, then the hyperplane becomes a two-
dimensional plane.
 It becomes difficult to imagine when the number of features exceeds 3.
 The objective is to find a plane that has the maximum margin, i.e the
maximum distance between data points of both classes.
 Support vectors are data points that are closer to the hyperplane and
influence the position and orientation of the hyperplane.
 Using these support vectors, can maximize the margin of the classifier.
 Deleting the support vectors will change the position of the hyperplane.
 Example Refer Figure 3.10

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 35

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Figure 3.10 – Example for Support Vectors

 Let’s consider two independent variables x1, x2 and one dependent

variable which is either a blue circle or a red box as shown in figure
3.11.

Figure 3.11 - Linearly Separable Data points

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 36

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

6.2 Cost Function and Gradient Updates

 In the SVM algorithm, to maximize the margin between the data points
and the hyperplane, the loss function helps to maximize the margin is
called hinge loss.

6.2.1 Hinge loss function

 The cost is 0 if the predicted value and the actual value are of the same
sign. If they are not, then calculate the loss value.
 The objective of the regularization parameter is to balance the margin
maximization and loss.
 After adding the regularization parameter, the cost functions looks as
below.

6.3 SVM Kernel:

6.4 Types of SVMs

6.5 Advantages of SVM:

 Effective on datasets with multiple features, like financial or medical
data.
 Effective in cases where number of features is greater than the number
of data points.
 Its memory efficient as it uses a subset of training points in the decision
function called support vectors

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 37

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 Different kernel functions can be specified for the decision functions and
its possible to specify custom kernels

6.6 Disadvantages
 If the number of features is a lot bigger than the number of data points,
choosing kernel functions and regularization term is crucial.
 SVMs don't directly provide probability estimates. Those are calculated
using an expensive five-fold cross-validation.
 Works best on small sample sets because of its high training time.

6.7 Applications
SVMs can be used to solve various real-world problems:
 SVMs are helpful in text and hypertext categorization.
 Classification of images can also be performed using SVMs.
 Classification of satellite data like SAR data using supervised SVM.
 Hand-written characters can be recognized using SVM.
 The SVM algorithm has been widely applied in the biological and other
sciences. They have been used to classify proteins with up to 90% of the
compounds classified correctly.

7. Elaborate in detail about Decision Tree in Supervised Learning.

DECISION TREE IN SUPERVISED LEARNING

7.1 Decision Tree
7.2 Types of Decision Trees
7.3 Reason for using Decision Trees
7.4 Decision Tree Terminologies
7.5 Working of Decision Tree algorithm
7.6 Algorithms used to construct Decision Trees
7.7 Attribute Selection Measures
7.7.1 Entropy
7.7.2.Information Gain
7.7.3.Gini Index
7.7.4 Gain Ratio
7.7.5 Reduction in variance
7.7.6 Chi-Square
7.8. Avoid/counter Over fitting in Decision Trees
7.8.1 Pruning Decision Trees
7.8.2 Random Forest
7.9 Advantages of the Decision Tree
7.10 Disadvantages of the Decision Tree

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 38

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

7.1 Decision Tree

 Decision Tree is a supervised learning technique that can be used for
both classification and Regression problems.
 It is a tree-structured classifier, where internal nodes represent the
features of a dataset, branches represent the decision rules and each leaf
node represents the outcome.
 In a Decision tree, there are two nodes, the Decision Node and Leaf
Node.
 As shown in figure 3.12, Decision nodes are used to make any decision
and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
 Example for Decision Tree Refer Figure 3.13
 The goal of using a Decision Tree is to create a training model that can
use to predict the class or value of the target variable by learning simple
decision rules inferred from prior data(training data).
 In order to build a tree, use the CART algorithm, which stands
for Classification and Regression Tree algorithm.

Figure 3.12 – Decision Tree Structure

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 39

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Figure 3.13 – Decision Tree Example

7.2 Types of Decision Trees

1. Categorical Variable Decision Tree: Decision Tree which has a
categorical target variable then it called a Categorical variable decision
tree.
2. Continuous Variable Decision Tree: Decision Tree has a continuous
target variable then it is called Continuous Variable Decision Tree.

7.3 Reason for using Decision Trees

7.4 Decision Tree Terminologies

 Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more homogeneous
sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 40

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 Pruning: Pruning is the process of removing the unwanted branches from

the tree.
 Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.

7.5 Working of Decision Tree algorithm

 In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree.
 This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and
jumps to the next node.
 For the next node, the algorithm again compares the attribute value with
the other sub-nodes and move further.
 It continues the process until it reaches the leaf node of the tree.
 The complete process can be better understood using the below algorithm:
Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the
best attributes.
Step-4: Generate the decision tree node, which contains the best
attribute.
Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage
is reached where cannot further classify the nodes and called
the final node as a leaf node.

Example:
 Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the
decision tree starts with the root node (Salary attribute by ASM). The root
node splits further into the next decision node (distance from the office) and
one leaf node based on the corresponding labels. The next decision node
further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram: Refer fig 3.14

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 41

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Decision Tree

Figure 3.14 – Decision Tree Algorithm Example

7.6 Algorithms used to construct Decision Trees:

 ID3 → (extension of D3)
 C4.5 → (successor of ID3)
 CART → (Classification And Regression Tree)
 CHAID → (Chi-square automatic interaction detection Performs
multi-level splits when computing classification trees)
 MARS → (multivariate adaptive regression splines)

7.7Attribute Selection Measures

 While implementing a Decision tree, Attribute selection measure
orASM is used to select the best attribute for the nodes of the tree.
1. Entropy,
2. Information gain,
3. Gini index,
4. Gain Ratio,
5. Reduction in Variance
6. Chi-Square

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 42

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

7.7.1 Entropy:
 Entropy is a metric to measure the impurity in a given
attribute.
 Entropy is a measure of the randomness in the information
being processed.
 The higher the entropy, the harder it is to draw any conclusions
from that information.
 Flipping a coin is an example of an action that provides
information that is random.
 Entropy can be calculated as:

Entropy(S) = -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,
o S= Total number of samples
o P(yes)= probability of yes
o P(no)= probability of no

7.7.2. Information Gain:

 Information gain or IG is a statistical property that measures
how well a given attribute separates the training examples
according to their target classification. Example Refer Fig 3.15.

Figure 3.15 – Information Gain Example

 Constructing a decision tree is all about finding an attribute

that returns the highest information gain and the smallest
entropy.
 Information gain is a decrease in entropy.
 It computes the difference between entropy before split and
average entropy after split of the dataset based on given
attribute values.
 It can be calculated using the below formula:

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 43

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

Information Gain= Entropy(S)-

[(Weighted Avg) *Entropy(each feature)]

7.7.3. Gini Index:

 Gini index as a cost function used to evaluate splits in the dataset.
 It is calculated by subtracting the sum of the squared probabilities
of each class from one.
 It favors larger partitions and easy to implement whereas
information gain favors smaller partitions with distinct values.
 Gini index can be calculated using the below formula:

7.7.4 Gain Ratio

 Information gain is biased towards choosing attributes with a large
number of values as root nodes.
 Gain ratio overcomes the problem with information gain by taking
the intrinsic information of a split into account.

7.7.5 Reduction in variance

 Reduction in variance is an algorithm that uses the standard
formula of variance to choose the best split.
 The split with lower variance is selected as the criteria to split the
population:

7.7.6 Chi-Square
 The acronym CHAID stands for Chi-squared Automatic Interaction
Detector.
 It finds out the statistical significance between the differences
between sub-nodes and parent node.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 44

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 Higher the value of Chi-Square higher the statistical significance of

differences between sub-node and Parent node.
 It generates a tree called CHAID (Chi-square Automatic Interaction
Detector).
 Mathematically, Chi-squared is represented as:

7.8. Avoid/counter Overfitting in Decision Trees

o Two ways to remove overfitting:
o Pruning Decision Trees.
o Random Forest

7.8.1 Pruning Decision Trees

 Pruning is a process of deleting the unnecessary nodes from a tree
in order to get the optimal decision tree.
 A too-large tree increases the risk of over fitting, and a small tree
may not capture all the important features of the dataset.
 Therefore, a technique that decreases the size of the learning tree
without reducing accuracy is known as Pruning.
 There are mainly two types of tree pruning technology used:
o Cost Complexity Pruning
o Reduced Error Pruning.

7.8.2 Random Forest

 Random Forest is an example of ensemble learning, in which we
combine multiple machine learning algorithms to obtain better
predictive performance.
 The name random means
 A random sampling of training data set when building trees.
 Random subsets of features considered when splitting nodes.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 45

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

7.9Advantages of the Decision Tree

 It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other
algorithms.

7.10 Disadvantages of the Decision Tree

 The decision tree contains lots of layers, which makes it complex.
 It may have an over fitting issue, which can be resolved using
the Random Forest algorithm.
 For more class labels, the computational complexity of the decision
tree may increase.

8. Elaborate in detail about Random Forest in Supervised Learning.

RANDOM FOREST
8.1 Random Forest
8.2 Steps in the working process of Random Forest
8.3 Need for Random Forest
8.4 Example:
8.5 Important Features of Random Forest
8.6 Applications of Random Forest
8.7 Advantages of Random Forest
8.8 Disadvantages of Random Forest
8.9 Difference between Decision Tree & Random Forest

8.1 Random Forest

 Random Forest is a classifier that contains a number of decision trees
on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset.
 Random Forest is a popular machine learning algorithm that belongs
to the supervised learning technique.
 It can be used for both Classification and Regression problems in ML.
 It is based on the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and to
improve the performance of the model.
 The greater number of trees in the forest leads to higher accuracy and
prevents the problem of over fitting.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 46

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

8.2 Steps in the working process of Random Forest

 The Working process can be explained in the below steps and
diagram:
Step 1: In Random forest n number of random records are taken from
the data set having k number of records.
Step 2: Individual decision trees are constructed for each sample.
Step 3: Each decision tree will generate an output.
Step 4: Final output is considered based on Majority Voting or
Averaging for Classification and regression respectively.
8.3 Need for Random Forest
 It takes less training time as compared to other algorithms.
 It predicts output with high accuracy, even for the large dataset it
runs efficiently.
 It can also maintain accuracy when a large proportion of data is
missing.
8.4 Example:
 Suppose there is a dataset that contains multiple fruit images. So,
this dataset is given to the Random forest classifier. The dataset is
divided into subsets and given to each decision tree. During the
training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results,
the Random Forest classifier predicts the final decision.

Figure 3.16 – Example for Random Forest

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 47

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

 In the above figure 3.16, the majority decision tree gives output as an
apple when compared to a banana, so the final output is taken as an
apple.

8.5 Important Features of Random Forest

1. Diversity-
Not all attributes/variables/features are considered while
making an individual tree, each tree is different.
2. Immune to the curse of dimensionality-
Since each tree does not consider all the features, the feature
space is reduced.
3. Parallelization-
Each tree is created independently out of different data and
attributes. This means that we can make full use of the CPU to
build random forests.
4. Train-Test split-
In a random forest we don’t have to segregate the data for train
and test as there will always be 30% of the data which is not
seen by the decision tree.
5. Stability-
Stability arises because the result is based on majority voting/
averaging.

8.6 Applications of Random Forest

1. Banking: Banking sector mostly uses this algorithm for the
identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of
the disease can be identified.
3. Land Use: We can identify the areas of similar land use by this
algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.

8.7 Advantages of Random Forest

 Random Forest is capable of performing both Classification and
Regression tasks.
 It is capable of handling large datasets with high dimensionality.
 It enhances the accuracy of the model and prevents the over fitting
issue.
8.8 Disadvantages of Random Forest
 Although random forest can be used for both classification and
regression tasks, it is not more suitable for Regression tasks.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 48

CS 3491 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING UNIT 3

8.9 Difference between Decision Tree & Random Forest

Decision trees Random Forest

Random forests are created from

Decision trees normally suffer from
subsets of data and the final output
the problem of overfitting if it’s
is based on average or majority
allowed to grow without any
ranking and hence the problem of
control.
overfitting is taken care of.

A single decision tree is faster in

It is comparatively slower.
computation.

When a data set with features is Random forest randomly selects

taken as input by a decision tree it observations, builds a decision tree
will formulate some set of rules to and the average result is taken. It
do prediction. doesn’t use any set of formulas.

PREPARED BY: R.Mohan, G.Selvapriya, S.Shanmuganathan, AP/CSE 49

Project Report On Artificial Intelligence
85% (47)
Project Report On Artificial Intelligence
23 pages
Telangana 8th Class Mathematics Text Book
78% (89)
Telangana 8th Class Mathematics Text Book
370 pages
PUPCET Practice Test Booklet
100% (1)
PUPCET Practice Test Booklet
44 pages
Cs3491-Artificial Intelligence and Machine Learning-819461728-Ai Unit 1
No ratings yet
Cs3491-Artificial Intelligence and Machine Learning-819461728-Ai Unit 1
73 pages
CS 3491 AIML UNIT 1 NOTES
No ratings yet
CS 3491 AIML UNIT 1 NOTES
74 pages
AI UNIT 1-1
No ratings yet
AI UNIT 1-1
73 pages
AI&ML NOTES Watermark
No ratings yet
AI&ML NOTES Watermark
220 pages
AI Notes_250304_095028
No ratings yet
AI Notes_250304_095028
59 pages
AI&ML NOTES Watermark
No ratings yet
AI&ML NOTES Watermark
220 pages
DS Xi Sec3
No ratings yet
DS Xi Sec3
101 pages
CSC 409 UPDATED LECTURE NOTE
No ratings yet
CSC 409 UPDATED LECTURE NOTE
31 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
5 pages
Ai Notes Jntuk r20 Unit 1
No ratings yet
Ai Notes Jntuk r20 Unit 1
17 pages
IET - Chapter 3
No ratings yet
IET - Chapter 3
27 pages
Department of Computer Science & Engineering Session 2020-2021
No ratings yet
Department of Computer Science & Engineering Session 2020-2021
125 pages
Unit 1: Types of AI Based On Capabilites
No ratings yet
Unit 1: Types of AI Based On Capabilites
10 pages
1- Business Artificial IntelegenceIntroduction
No ratings yet
1- Business Artificial IntelegenceIntroduction
25 pages
AI_XI_SEC3
No ratings yet
AI_XI_SEC3
47 pages
Artificial Intelligence (AI)
No ratings yet
Artificial Intelligence (AI)
28 pages
Artificial Intelligence
100% (1)
Artificial Intelligence
4 pages
1 Introduction
No ratings yet
1 Introduction
37 pages
22cs601-Artificial Intelligence and Machine Learning Nodes
No ratings yet
22cs601-Artificial Intelligence and Machine Learning Nodes
232 pages
IX-Unit-1-Q&A-Introduction of AI
No ratings yet
IX-Unit-1-Q&A-Introduction of AI
4 pages
Artificial_Intelligence_and_Deep_Learnin
No ratings yet
Artificial_Intelligence_and_Deep_Learnin
5 pages
Artificial-Intelligent
No ratings yet
Artificial-Intelligent
14 pages
jasc-february-3464
No ratings yet
jasc-february-3464
4 pages
Artificial Intelligence Seminar
No ratings yet
Artificial Intelligence Seminar
23 pages
AI English Notes
No ratings yet
AI English Notes
18 pages
Chapter 3 AI
No ratings yet
Chapter 3 AI
32 pages
Unit 1 AI Serach
No ratings yet
Unit 1 AI Serach
41 pages
Computer
No ratings yet
Computer
9 pages
ML Unit-1
No ratings yet
ML Unit-1
9 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
9 pages
GROUP WORK - 1 ET
No ratings yet
GROUP WORK - 1 ET
99 pages
Chapter 3
No ratings yet
Chapter 3
32 pages
UNIT 2
No ratings yet
UNIT 2
34 pages
Lec 1
No ratings yet
Lec 1
23 pages
Ai Proj
No ratings yet
Ai Proj
23 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
5 pages
Chapter 1 Artificial Intelligence
No ratings yet
Chapter 1 Artificial Intelligence
38 pages
HW_626710_1CH12&
No ratings yet
HW_626710_1CH12&
14 pages
MODULE 1 INTRODUCTION TO AI
No ratings yet
MODULE 1 INTRODUCTION TO AI
15 pages
Paper 4
No ratings yet
Paper 4
17 pages
ch-1.2
No ratings yet
ch-1.2
68 pages
AI Notes All Units
No ratings yet
AI Notes All Units
151 pages
1 Intro
No ratings yet
1 Intro
11 pages
1.4 Data Science Topic 2
No ratings yet
1.4 Data Science Topic 2
11 pages
AI Important
No ratings yet
AI Important
20 pages
Scou 220 Manual t10
No ratings yet
Scou 220 Manual t10
33 pages
AI 4.0 Masterclass (PDF - Io) PDF
No ratings yet
AI 4.0 Masterclass (PDF - Io) PDF
29 pages
Ai ML
No ratings yet
Ai ML
25 pages
Artificial Intelligence - ETT Reviewer
No ratings yet
Artificial Intelligence - ETT Reviewer
7 pages
History of Artificial Intelligence
No ratings yet
History of Artificial Intelligence
9 pages
AIppt
No ratings yet
AIppt
27 pages
Unit-1 AI
No ratings yet
Unit-1 AI
50 pages
AI vs ML
No ratings yet
AI vs ML
7 pages
ArtificialIntelligenceinArchitectureIntegrationintoArchitecturalDesign
No ratings yet
ArtificialIntelligenceinArchitectureIntegrationintoArchitecturalDesign
17 pages
11zon Removed Pages PDF
No ratings yet
11zon Removed Pages PDF
24 pages
AI Chapter One
No ratings yet
AI Chapter One
46 pages
3 Artificial Intelligence
No ratings yet
3 Artificial Intelligence
24 pages
Artificial Intelligence: A Beginner's Guide
From Everand
Artificial Intelligence: A Beginner's Guide
Park Windsor
No ratings yet
Machine Minds AI for all: An Ethical Intelligence & Responsible Revolution
From Everand
Machine Minds AI for all: An Ethical Intelligence & Responsible Revolution
aarat
No ratings yet
Industrial Automation, SDL
No ratings yet
Industrial Automation, SDL
65 pages
DEV LAB MANUAL
No ratings yet
DEV LAB MANUAL
35 pages
LCDF3 Chap 06 P1
No ratings yet
LCDF3 Chap 06 P1
54 pages
FULL SRM
No ratings yet
FULL SRM
80 pages
Data Exploration and Visualization - AD3301 - Hand Written Notes - Unit 2 - Visualizing Using Matplotlib
No ratings yet
Data Exploration and Visualization - AD3301 - Hand Written Notes - Unit 2 - Visualizing Using Matplotlib
39 pages
ie5504
No ratings yet
ie5504
34 pages
OEI354-IIA Syllabus
No ratings yet
OEI354-IIA Syllabus
2 pages
DOM Rev3 Block Diagram
No ratings yet
DOM Rev3 Block Diagram
2 pages
Architecture of Industrial Automation Systems
No ratings yet
Architecture of Industrial Automation Systems
27 pages
Profibus Quick Reference
No ratings yet
Profibus Quick Reference
20 pages
Real Time Concept Based On Industrial Automation
No ratings yet
Real Time Concept Based On Industrial Automation
14 pages
Payslip Nov2023
No ratings yet
Payslip Nov2023
1 page
Bus Protocol
No ratings yet
Bus Protocol
17 pages
Metal Oxide Semiconductor Field-Effect Transistor
No ratings yet
Metal Oxide Semiconductor Field-Effect Transistor
5 pages
Class04 X86assembly
No ratings yet
Class04 X86assembly
44 pages
Ieee Tii Rtsoa
No ratings yet
Ieee Tii Rtsoa
12 pages
Top 10 Companies in Industrial Automation 1
No ratings yet
Top 10 Companies in Industrial Automation 1
4 pages
Industrial Automation Refers To The Use of Control Systems
No ratings yet
Industrial Automation Refers To The Use of Control Systems
4 pages
Digital Fundamentals & Computer Architecture
No ratings yet
Digital Fundamentals & Computer Architecture
110 pages
4 DCSref
No ratings yet
4 DCSref
25 pages
What Is The Purpose of An Automated Conveyor System in Manufacturing
No ratings yet
What Is The Purpose of An Automated Conveyor System in Manufacturing
2 pages
Stucor Be3254 KG
No ratings yet
Stucor Be3254 KG
243 pages
Ring Counter in Digital Logic
No ratings yet
Ring Counter in Digital Logic
3 pages
There Are Many Types of Mass Flow Meters
No ratings yet
There Are Many Types of Mass Flow Meters
1 page
Combinational Logic-LOGIC GATES
No ratings yet
Combinational Logic-LOGIC GATES
12 pages
EE3302 - DLC 2 Marks With Answers
No ratings yet
EE3302 - DLC 2 Marks With Answers
14 pages
HYGROMETER
No ratings yet
HYGROMETER
11 pages
TOR ASEAN CSOs Virtual Consultative Sessions On AAFDM - Pujiono Centre
No ratings yet
TOR ASEAN CSOs Virtual Consultative Sessions On AAFDM - Pujiono Centre
6 pages
Robit Hyper 63GA Manual 2018 PDF
No ratings yet
Robit Hyper 63GA Manual 2018 PDF
9 pages
3 Periodical Test: Tle 9-Food Processing
50% (2)
3 Periodical Test: Tle 9-Food Processing
2 pages
Einvoice, June 24 For MR RASHAD HARILALL
No ratings yet
Einvoice, June 24 For MR RASHAD HARILALL
2 pages
Recital Program - Griswold & Meler Nov 2022
No ratings yet
Recital Program - Griswold & Meler Nov 2022
8 pages
Phenomenology
No ratings yet
Phenomenology
26 pages
MBA-Y22 Syllabus - Updated
No ratings yet
MBA-Y22 Syllabus - Updated
74 pages
Woman: I'll Be Back in The Office Early On Monday. Man: Right. A Good Weekend, Then
No ratings yet
Woman: I'll Be Back in The Office Early On Monday. Man: Right. A Good Weekend, Then
19 pages
Json - JSON Encoder and Decoder - Python 3.7.1rc1 Documentation
No ratings yet
Json - JSON Encoder and Decoder - Python 3.7.1rc1 Documentation
12 pages
LIST OF TEST Building Material
100% (1)
LIST OF TEST Building Material
4 pages
All AT-Wastegate Maint Recommendation
No ratings yet
All AT-Wastegate Maint Recommendation
1 page
Exercise 14.1: NCERT Solution For Class 9 Maths Chapter 14-Statistics
No ratings yet
Exercise 14.1: NCERT Solution For Class 9 Maths Chapter 14-Statistics
19 pages
Warehouse MIS
No ratings yet
Warehouse MIS
32 pages
OM XD 30 Filter Replacement Chart
No ratings yet
OM XD 30 Filter Replacement Chart
3 pages
Ode Introduction
No ratings yet
Ode Introduction
2 pages
Plato PPT by Dr. Lalnundika Hnamte
No ratings yet
Plato PPT by Dr. Lalnundika Hnamte
33 pages
B2B Sales Group
No ratings yet
B2B Sales Group
1 page
(Routledge Advances in Middle East and Islamic Studies) Maria Elena Paniconi - Bildungsroman and The Arab Novel Egyptian Intersections-Routledge (2022)
No ratings yet
(Routledge Advances in Middle East and Islamic Studies) Maria Elena Paniconi - Bildungsroman and The Arab Novel Egyptian Intersections-Routledge (2022)
245 pages
The Tell Tale Heart
No ratings yet
The Tell Tale Heart
44 pages
Strategic Structure Contract Law
0% (1)
Strategic Structure Contract Law
178 pages
Relevant Costs (Part 2) : F. M. Kapepiso
No ratings yet
Relevant Costs (Part 2) : F. M. Kapepiso
21 pages
Field Attachment Report
No ratings yet
Field Attachment Report
44 pages
Davis, Suzanne - Social and Scientific Influences Childrens Suggestibility
No ratings yet
Davis, Suzanne - Social and Scientific Influences Childrens Suggestibility
10 pages
Tort Law TRESPASS TO LAND
No ratings yet
Tort Law TRESPASS TO LAND
42 pages
The Angel Next Door Spoils Me Rotten v01 (Yen Press) (LuCaZ) - 5
No ratings yet
The Angel Next Door Spoils Me Rotten v01 (Yen Press) (LuCaZ) - 5
8 pages
Muse
No ratings yet
Muse
8 pages
Tc Quantum Field Intro
No ratings yet
Tc Quantum Field Intro
110 pages
Topik 5 - CUSTOMER EXPECTATION
No ratings yet
Topik 5 - CUSTOMER EXPECTATION
16 pages