0% found this document useful (0 votes)
115 views43 pages

AI Notes

The document discusses the history and development of artificial intelligence from its origins in the 1940s to modern applications. It covers major milestones like the development of the Turing test in 1950, the first AI program in 1955, and the booms and "winters" of funding and research through the 1970s-1990s. The document also defines key concepts in AI like agents, environments, learning models, and career opportunities in fields like deep learning, big data, and artificial general intelligence.

Uploaded by

Simi Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views43 pages

AI Notes

The document discusses the history and development of artificial intelligence from its origins in the 1940s to modern applications. It covers major milestones like the development of the Turing test in 1950, the first AI program in 1955, and the booms and "winters" of funding and research through the 1970s-1990s. The document also defines key concepts in AI like agents, environments, learning models, and career opportunities in fields like deep learning, big data, and artificial general intelligence.

Uploaded by

Simi Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Unit 1

Artificial Intelligence Allows Machines to Work Efficiently and Solve


Problems. See How AI Can Learn and Analyse Vast Amounts of Data
in the Blink of an Eye. As-a-service. Edge to Core. Enterprise AI.

Artificial Intelligence (AI) is the advanced field of computer science dedicated


to automating behaviour commonly associated with human intelligence. Like
happens with any complex science, many concepts are broken down from it.
History of Artificial Intelligence
Artificial Intelligence is not a new word and not a new technology for researchers. This technology is
much older than you would imagine. Even there are the myths of Mechanical men in Ancient Greek and
Egyptian Myths. Following are some milestones in the history of AI which defines the journey from the
AI generation to till date development.
Maturation of Artificial Intelligence (1943-1952)
o Year 1943: The first work which is now recognized as AI was done by Warren
McCulloch and Walter pits in 1943. They proposed a model of artificial neurons.
o Year 1949: Donald Hebb demonstrated an updating rule for modifying the connection
strength between neurons. His rule is now called Hebbian learning.
o Year 1950: The Alan Turing who was an English mathematician and pioneered
Machine learning in 1950. Alan Turing publishes "Computing Machinery and
Intelligence" in which he proposed a test. The test can check the machine's ability to
exhibit intelligent behavior equivalent to human intelligence, called a Turing test.

The birth of Artificial Intelligence (1952-1956)


o Year 1955: An Allen Newell and Herbert A. Simon created the "first artificial
intelligence program"Which was named as "Logic Theorist". This program had proved
38 of 52 Mathematics theorems and find new and more elegant proofs for some
theorems.
o Year 1956: The word "Artificial Intelligence" first adopted by American Computer
scientist John McCarthy at the Dartmouth Conference. For the first time, AI coined as
an academic field.

At that time high-level computer languages such as FORTRAN, LISP, or COBOL were invented.
And the enthusiasm for AI was very high at that time.

The golden years-Early enthusiasm (1956-1974)


o Year 1966: The researchers emphasized developing algorithms which can solve mathematical
problems. Joseph Weizenbaum created the first chatbot in 1966, which was named as ELIZA.

o Year 1972: The first intelligent humanoid robot was built in Japan which was named as WABOT-
1.

The first AI winter (1974-1980)


o The duration between years 1974 to 1980 was the first AI winter duration. AI winter refers to
the time period where computer scientist dealt with a severe shortage of funding from
government for AI research.

o During AI winters, an interest of publicity on artificial intelligence was decreased.

A boom of AI (1980-1987)
o Year 1980: After AI winter duration, AI came back with "Expert System". Expert systems were programmed

that emulate the decision-making ability of a human expert.

o In the Year 1980, the first national conference of the American Association of Artificial Intelligence was

held at Stanford University.

The second AI winter (1987-1993)


o The duration between the years 1987 to 1993 was the second AI Winter duration.

o Again Investors and government stopped in funding for AI research as due to high cost but not efficient

result. The expert system such as XCON was very cost effective.

The emergence of intelligent agents (1993-2011)


o Year 1997: In the year 1997, IBM Deep Blue beats world chess champion, Gary Kasparov, and became the

first computer to beat a world chess champion.

o Year 2002: for the first time, AI entered the home in the form of Roomba, a vacuum cleaner.

o Year 2006: AI came in the Business world till the year 2006. Companies like Facebook, Twitter, and Netflix

also started using AI.

Deep learning, big data and artificial general intelligence (2011-present)


o Year 2011: In the year 2011, IBM's Watson won jeopardy, a quiz show, where it had to solve the complex

questions as well as riddles. Watson had proved that it could understand natural language and can solve

tricky questions quickly.

o Year 2012: Google has launched an Android app feature "Google now", which was able to provide

information to the user as a prediction.

o Year 2014: In the year 2014, Chatbot "Eugene Goostman" won a competition in the infamous "Turing

test."

o Year 2018: The "Project Debater" from IBM debated on complex topics with two master debaters and

performed extremely well.

o Google has demonstrated an AI program "Duplex" which was a virtual assistant, and which had taken

hairdresser appointment on call, and lady on other side didn't notice that she was talking with the machine.

Scope of AI (AI Careers)


Fresher's should analyze their competencies and skills and choose a better AI role with the potential for
upward mobility. The future scope of Artificial Intelligence continues to grow due to new job roles and
advancements in the AI field. The various roles in an AI career are as follows:

Artificial intelligence is defined as the study of rational agents. A rational agent


could be anything that makes decisions, as a person, firm, machine, or
software.
Examples of Agent:
 A software agent has Keystrokes, file contents, received network
packages which act as sensors and displays on the screen, files, sent
network packets acting as actuators.
 A Human-agent has eyes, ears, and other organs which act as
sensors, and hands, legs, mouth, and other body parts acting as
actuators.
 A Robotic agent has Cameras and infrared range finders which act
as sensors and various motors acting as actuators.

Types of Agents
Agents can be grouped into five classes based on their degree of perceived
intelligence and capability :
 Simple Reflex Agents
 Model-Based Reflex Agents
 Goal-Based Agents
 Utility-Based Agents
 Learning Agent
Simple reflex agents
Simple reflex agents ignore the rest of the percept history and act only on the
basis of the current percept. Percept history is the history of all that an agent
has perceived to date. The agent function is based on the condition-action
rule.

A model-based agent can handle partially observable environments using


a model about the world. The agent must keep track of the internal
state which is adjusted by each percept and that depends on the percept
history.

Goal-based agents
These kinds of agents take decisions based on how far they are currently from
their goal (description of desirable situations).
Utility-based agents
The agents which are developed having their end uses as building blocks are
called utility-based agents
Learning Agent:
A learning agent in AI is the type of agent that can learn from its past
experiences, or it has learning capabilities. It starts to act with basic knowledge
and then can act and adapt automatically through learning.

Following are the main four rules for an AI agent:


o Rule 1: An AI agent must have the ability to perceive the environment.
o Rule 2: The observation must be used to make decisions.
o Rule 3: Decision should result in an action.
o Rule 4: The action taken by an AI agent must be a rational action.

Structure of an AI Agent
1. Agent = Architecture + Agent program

Architecture: Architecture is machinery that an AI agent executes on.

Agent Function: Agent function is used to map a percept to an action.

PEAS Representation
PEAS is a type of model on which an AI agent works upon.
o P: Performance measure
o E: Environment
o A: Actuators
o S: Sensors

Agent Environment in AI
An environment is everything in the world which surrounds the agent, but it is not a
part of an agent itself. An environment can be described as a situation in which an
agent is present.

Features of Environment
As per Russell and Norvig, an environment can have various features from the point of
view of an agent:
1. Fully observable vs Partially Observable
2. Static vs Dynamic
3. Discrete vs Continuous
4. Deterministic vs Stochastic
5. Single-agent vs multi-agent
6. Episodic vs sequential
7. Known vs Unknown
8. Accessible vs Inaccessible

Fully observable vs Partially Observable:


o If an agent sensor can sense or access the complete state of an environment at
each point of time then it is a fully observable environment, else it is partially
observable.
o An agent with no sensors in all environments then such an environment is called
as unobservable.

Deterministic vs Stochastic:
o If an agent's current state and selected action can completely determine the next
state of the environment, then such environment is called a deterministic
environment.
o A stochastic environment is random in nature and cannot be determined completely
by an agent.

Episodic vs Sequential:
o In an episodic environment, there is a series of one-shot actions, and only the current
percept is required for the action.
o However, in Sequential environment, an agent requires memory of past actions to
determine the next best actions.

Single-agent vs Multi-agent
o If only one agent is involved in an environment and operating by itself then such
an environment is called single agent environment.
o However, if multiple agents are operating in an environment, then such an environment
is called a multi-agent environment.

Static vs Dynamic:
o If the environment can change itself while an agent is deliberating then such
environment is called a dynamic environment else it is called a static environment.

Discrete vs Continuous:
o If in an environment there are a finite number of percepts and actions that can be
performed within it, then such an environment is called a discrete environment else it
is called continuous environment.

Known vs Unknown
o Known and unknown are not actually a feature of an environment, but it is an agent's
state of knowledge to perform an action.

Accessible vs Inaccessible
o If an agent can obtain complete and accurate information about the state's
environment, then such an environment is called an Accessible environment else it is
called inaccessible.

Problem formulation: It is one of the core steps of problem-solving which


decides what action should be taken to achieve the formulated goal. In AI
this core part is dependent upon software agent which consisted of the
following components to formulate the associated problem.
Components to formulate the associated problem:
 Initial State: This state requires an initial state for the problem which
starts the AI agent towards a specified goal. In this state new
methods also initialize problem domain solving by a specific class.
 Action: This stage of problem formulation works with function with
a specific class taken from the initial state and all possible actions
done in this stage.
 Transition: This stage of problem formulation integrates the actual
action done by the previous action stage and collects the final
stage to forward it to their next stage.
 Goal test: This stage determines that the specified goal achieved
by the integrated transition model or not, whenever the goal
achieves stop the action and forward into the next stage to
determines the cost to achieve the goal.
 Path costing: This component of problem-solving numerical
assigned what will be the cost to achieve the goal. It requires all
hardware software and human working cost.
Examples of Problem Formulation
Water Jug Problem
 Problem Statement
 Problem Definition: You have to measure 4 liter (L) water by using
three buckets 8L, 5L and 3L.
 Problem Limitation: You can only use these (8L, 5L and 3L) buckets
 Problem Solution: Measure exactly 4L water
 Solution Space: There are multiple ways doing this.
 Operators: Possible actions are fill water in any bucket and remove water
from any bucket.

Tree vs Graph data structure


Before knowing about the tree and graph data structure, we should know the linear
and non-linear data structures. Linear data structure is a structure in which all the
elements are stored sequentially and have only single level. In contrast, a non-linear
data structure is a structure that follows a hierarchy, i.e., elements are arranged in
multiple levels.

Let's understand the structure that forms the hierarchy.


What is Tree?
A tree is a non-linear data structure that represents the hierarchy. A tree is a collection
of nodes that are linked together to form a hierarchy.

Let's look at some terminologies used in a tree data structure.


o Root node: The topmost node in a tree data structure is known as a root node. A root
node is a node that does not have any parent.
o Parent of a node: The immediate predecessor of a node is known as a parent of a
node. Here predecessor means the previous node of that particular node.
o Child of a node: The immediate successor of a node is known as a child of a node.
o Leaf node: The leaf node is a node that does not have any child node. It is also known
as an external node.
o Non-leaf node: The non-leaf node is a node that has atleast one child node. It is also
known as an internal node.
o Path: It is a sequence of the consecutive edges from a source node to the destination
node. Here edge is a link between two nodes.
o Ancestor: The predecessor nodes that occur in the path from the root to that node is
known as an ancestor.
o Descendant: The successor nodes that exist in the path from that node to the leaf
node.
o Sibling: All the children that have the same parent node are known as siblings.
o Degree: The number of children of a particular node is known as a degree.
o Depth of node: The length of the path from the root to that node is known as a depth
of a node.
o Height of a node: The number of edges that occur in the longest path from that node
to the leaf node is known as the height of a node.
o Level of node: The number of edges that exist from the root node to the given node
is known as a level of a node.
How is a tree represented in the memory?
Each node will contain three parts, data part, address of the left subtree, and address
of the right subtree. If any node does not have the child, then both link parts will have
NULL values.
What is a Graph?
A graph is like a tree data structure is a collection of objects or entities known as nodes
that are connected to each other through a set of edges. A tree follows some rule that
determines the relationship between the nodes, whereas graph does not follow any
rule that defines the relationship among the nodes. A graph contains a set of edges
and nodes, and edges can connect the nodes in any possible way.

Mathematically, it can be defined as an ordered pair of a set of vertices, and a set of


nodes where vertices are represented by 'V' and edges are represented by 'E'.

G= (V , E)

Here we are referring to an ordered pair be

Basis for Tree Graph


comparison

Definition Tree is a non-linear data structure in A Graph is also a non-linear data structure.
which elements are arranged in
multiple levels.
Structure It is a collection of edges and nodes. It is a collection of vertices and edges. For example
For example, node is represented by and edge is represented as 'E', so
N and edge is represented as E, so it T = {V, E}
can be written as:
T = {N,E}

Root node In tree data structure, there is a In graph data structure, there is no unique node.
unique node known as a parent node.
It represents the topmost node in the
tree data structure.

Loop formation It does not create any loop or cycle. In graph, loop or cycle can be formed.

Model type It is a hierarchical model because It is a network model. For example, facebook is a soc
nodes are arranged in multiple level, data structure.
and that creates a hierarchy. For
example, any organization will have a
hierarchical model.

Edges If there are n nodes then there would The number of edges depends on the graph.
be n-1 number of edges.

Type of edge Tree data structure will always have In graph data structure, all the edges can either be dir
directed edges. or both.

Applications It is used for inserting, deleting or It is mainly used for finding the shortest path in the n
searching any element in tree.
There are two types of edges:
Directed edge: .
Undirected edge:

There are two types of graphs:

Directed graph: The graph with the directed edges known as a directed graph.
Undirected graph: The graph with the undirected edges known as a undirected graph.

The state space representation forms the basis of most of


the AI methods.

Its structure corresponds to the structure of problem solving in two


important ways:

1. It allows for a formal definition of a problem as per the


need to convert some given situation into some
desired situation using a set of permissible
operations.
2. It permits the problem to be solved with the help of
known techniques and control strategies to move
through the problem space until goal state is found.

Artificial Intelligence is the study of building agents that act rationally. Most
of the time, these agents perform some kind of search algorithm in the
background in order to achieve their tasks.

 A search problem consists of:


 A State Space. Set of all possible states where you can be.
 A Start State. The state from where the search begins.
 A Goal Test. A function that looks at the current state
returns whether or not it is the goal state.
 The Solution to a search problem is a sequence of actions, called
the plan that transforms the start state to the goal state.
 This plan is achieved through search algorithms.

The following uninformed search algorithms are discussed in this section.


1. Depth First Search
2. Breadth First Search
3. Uniform Cost Search
Depth First Search:
Depth-first search (DFS) is an algorithm for traversing or searching tree or
graph data structures. The algorithm starts at the root node (selecting some
arbitrary node as the root node in the case of a graph) and explores as far as
possible along each branch before backtracking. It uses last in- first-out
strategy and hence it is implemented using a stack.

Path: S -> A -> B -> C -> G

Breadth First Search:


Breadth-first search (BFS) is an algorithm for traversing or searching tree or
graph data structures. It starts at the tree root (or some arbitrary node of a
graph, sometimes referred to as a ‘search key’), and explores all of the
neighbor nodes at the present depth prior to moving on to the nodes at the
next depth level. It is implemented using a queue.
Example:
Question. Which solution would BFS find to move from node S to node G if
run on the graph below?
Path: S -> D -> G

Uniform Cost Search:


UCS is different from BFS and DFS because here the costs come into play. In
other words, traversing via different edges might not have the same cost. The
goal is to find a path where the cumulative sum of costs is the least.
Cost of a node is defined as:
cost(node) = cumulative cost of all nodes from root
cost(root) = 0
Example:
Question. Which solution would UCS find to move from node S to node G if
run on the graph below?
Informed Search Algorithms:
Here, the algorithms have information on the goal state, which helps in more
efficient searching. This information is obtained by something called
a heuristic.
In this section, we will discuss the following search algorithms.
1. Greedy Search
2. A* Tree Search
3. A* Graph Search
Search Heuristics: In an informed search, a heuristic is a function that
estimates how close a state is to the goal state. For example – Manhattan
distance, Euclidean distance, etc. (Lesser the distance, closer the goal.

Greedy Search:
In greedy search, we expand the node closest to the goal node. The
“closeness” is estimated by a heuristic h(x).
Question. Find the path from S to G using greedy search. The heuristic values
h of each node below the name of the node.
Solution. Starting from S, we can traverse to A(h=9) or D(h=5). We choose
D, as it has the lower heuristic cost. Now from D, we can move to B(h=4) or
E(h=3). We choose E with a lower heuristic cost. Finally, from E, we go to
G(h=0). This entire traversal is shown in the search tree below, in blue.

Path: S -> D -> E -> G

A* Tree Search:
A* Tree Search, or simply known as A* Search, combines the strengths of
uniform-cost search and greedy search. In this search, the heuristic is the
summation of the cost in UCS, denoted by g(x), and the cost in the greedy
search, denoted by h(x). The summed cost is denoted by f(x).
Heuristic: The following points should be noted wrt heuristics in A*
search.
 Here, h(x) is called the forward cost and is an estimate of the
distance of the current node from the goal node.
 And g(x) is called the backward cost and is the cumulative cost of a
node from the root node.
 Example:
 Question. Find the path to reach from S to G using A* search.

 Path: S -> D -> B -> E -


> G
Cost: 7
A* Graph Search:
 A* tree search works well, except that it takes time re-exploring the
branches it has already explored. In other words, if the same node
has expanded twice in different branches of the search tree, A*
search might explore both of those branches, thus wasting time
 A* Graph Search, or simply Graph Search, removes this limitation by
adding this rule: do not expand the same node more than once.
 Heuristic. Graph search is optimal only when the forward cost
between two successive nodes A and B, given by h(A) – h (B), is less
than or equal to the backward cost between those two nodes g(A ->
B). This property of the graph search heuristic is
called consistency.
Consistency:
Example:
Question. Use graph searches to find paths from S to G in the following
graph.
the Solution. We solve this question pretty much the same way we solved last
question, but in this case, we keep a track of nodes explored so that we don’t
re-explore them.

Path: S -> D -> B -> C -> E -> G


Cost: 7
Unit 2
Random Search

Random search is a technique where random combinations of


the hyperparameters are used to find the best solution for the built
model. It tries random combinations of a range of values. To
optimise with random search, the function is evaluated at some
number of random configurations in the parameter space.
The chances of finding the optimal parameter are comparatively
higher in random search because of the random search pattern where
the model might end up being trained on the optimised parameters
without any aliasing. Random search works best for lower
dimensional data since the time taken to find the right set is less
with a smaller number of iterations. Random search is the best
parameter search technique when there are a smaller number of
dimensions.
There are many theoretical and practical concerns when evaluating
optimisation strategies. The best strategy for your problem is the one
that finds the best value the fastest and with the fewest
function evaluations and it may vary from problem to problem.
Random search involves generating and evaluating random inputs to the objective
function. It’s effective because it does not assume anything about the structure of the
objective function. We can generate a random sample from a domain using
a pseudorandom number generator. Each variable requires a well-defined bound or range
and a uniformly random value can be sampled from the range, then evaluated.
Generating random samples is computationally trivial and does not take up much memory,
therefore, it may be efficient to generate a large sample of inputs, then evaluate them.
Each sample is independent, so samples can be evaluated in parallel if needed to
accelerate the process.
Running the example generates a random sample of input values, which are then
evaluated. The best performing point is then identified and reported.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few
times and compare the average outcome.
In this case, we can see that the result is very close to the optimal input of 0.0.
1Best: f(-0.01762) = 0.00031

Random Search.pdf

Search with closed and open list,


But informed search algorithm contains an array of knowledge such as how far we are from
the goal, path cost, how to reach to goal node, etc. This knowledge help agents to explore
less to the search space and find more efficiently the goal node.
The informed search algorithm is more useful for large search space. Informed search
algorithm uses the idea of heuristic, so it is also called Heuristic search.
Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the
most promising path. It takes the current state of the agent as its input and produces the
estimation of how close agent is from the goal. The heuristic method, however, might not
always give the best solution, but it guaranteed to find a good solution in reasonable time.
Heuristic function estimates how close a state is to the goal. It is represented by h(n), and it
calculates the cost of an optimal path between the pair of states. The value of the heuristic
function is always positive.
1. h(n) <= h*(n)

Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost
should be less than or equal to the estimated cost.

Pure Heuristic Search:


Pure heuristic search is the simplest form of heuristic search algorithms. It expands
nodes based on their heuristic value h(n). It maintains two lists, OPEN and CLOSED
list. In the CLOSED list, it places those nodes which have already expanded and in the
OPEN list, it places nodes which have yet not been expanded.

On each iteration, each node n with the lowest heuristic value is expanded and
generates all its successors and n is placed to the closed list. The algorithm continues
unit a goal state is found.

Best first search algorithm:


o Step 1: Place the starting node into the OPEN list.
o Step 2: If the OPEN list is empty, Stop and return failure.
o Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and
places it in the CLOSED list.
o Step 4: Expand the node n, and generate the successors of node n.
o Step 5: Check each successor of node n and find whether any node is a goal node or
not. If any successor node is goal node, then return success and terminate the search,
else proceed to Step 6.
o Step 6: For each successor node, algorithm checks for evaluation function f(n), and
then check if the node has been in either OPEN or CLOSED list. If the node has not
been in both list, then add it to the OPEN list.
o Step 7: Return to Step 2.
Game Search
Adversarial search is a search, where we examine the problem which arises when we
try to plan of the world and other agents are planning against us.
o there might be some situations where more than one agent is searching for the
solution in the same search space, and this situation usually occurs in game playing.
o The environment with more than one agent is termed as multi-agent environment,
in which each agent is an opponent of other agent and playing against each other.
Each agent needs to consider the action of other agent and effect of that action
on their performance.
o So, Searches in which two or more players with conflicting goals are trying to
explore the same search space for the solution, are called adversarial searches,
often known as Games.
o Games are modelled as a Search problem and heuristic evaluation function, and these
are the two main factors which help to model and solve games in AI.

Types of Games in AI:


Deterministic Chance Moves

Perfect information Chess, Checkers, go, Othello Backgammon, monopoly

Imperfect information Battleships, blind, tic-tac-toe Bridge, poker, scrabble, nuclear war

o Perfect information: A game with the perfect information is that in which agents can
investigate the complete board. Agents have all the information about the game,
and they can see each other moves also. Examples are Chess, Checkers, Go, etc.
o Imperfect information: If in a game agent do not have all information about the
game and not aware with what's going on, such type of games is called the game
with imperfect information, such as tic-tac-toe, Battleship, blind, Bridge, etc.
o Deterministic games: Deterministic games are those games which follow a strict
pattern and set of rules for the games, and there is no randomness associated with
them. Examples are chess, Checkers, Go, tic-tac-toe, etc.
o Non-deterministic games: non-deterministic are those games which have various
unpredictable events and has a factor of chance or luck. This factor of chance or
luck is introduced by either dice or cards. These are random, and each action response
is not fixed. Such games are also called as stochastic games.
Example: Backgammon, Monopoly, Poker, etc.
Note: In this topic, we will discuss deterministic games, fully observable
environment, zero-sum, and where each agent acts alternatively.

Zero-Sum Game
o Zero-sum games are adversarial search which involves pure competition.
o In Zero-sum game each agent's gain or loss of utility is exactly balanced by the
losses or gains of utility of another agent.
o One player of the game tries to maximize one single value, while another player
tries to minimize it.
o Each move by one player in the game is called as ply.
o Chess and tic-tac-toe are examples of a Zero-sum game.
Zero-sum game: Embedded thinking
The Zero-sum game involved embedded thinking in which one agent or player is
trying to figure out:
o What to do.
o How to decide the move
o Needs to think about his opponent as well
o The opponent also thinks what to do

Each of the players is trying to find out the response of his opponent to their
actions. This requires embedded thinking or backward reasoning to solve the game
problems in AI.
Formalization of the problem:
A game can be defined as a type of search in AI which can be formalized of the
following elements:
57.3M
1.2K
Features of Java - Javatpoint

o Initial state: It specifies how the game is set up at the start.


o Player(s): It specifies which player has moved in the state space.
o Action(s): It returns the set of legal moves in state space.
o Result (s, a): It is the transition model, which specifies the result of moves in the state
space.
o Terminal-Test(s): Terminal test is true if the game is over, else it is false at any case.
The state where the game ends is called terminal states.
o Utility(s, p): A utility function gives the final numeric value for a game that ends in
terminal states s for player p. It is also called payoff function. For Chess, the outcomes
are a win, loss, or draw and its payoff values are +1, 0, ½. And for tic-tac-toe, utility
values are +1, -1, and 0.

Unit 3

Definition
The probability of occurrence of any event A when another event B in relation to A has already
occurred is known as conditional probability. It is depicted by P(A|B).
As depicted by the above diagram, sample space is given by S, and there are two events A and B.
In a situation where event B has already occurred, then our sample space S naturally gets
reduced to B because now the chances of occurrence of an event will lie inside B.
As we must figure out the chances of occurrence of event A, only a portion common to both A
and B is enough to represent the probability of occurrence of A, when B has already occurred.
The common portion of the events is depicted by the intersection of both the events A and B, i.e.
A ∩ B.
This explains the concept of conditional probability problems, i.e., occurrence of any event when
another event in relation to has already occurred.

Formula
When the intersection of two events happen, then the formula for conditional probability for the
occurrence of two events is given by;

P(A|B) = N(A∩B)/N(B)
Or
P(B|A) = N(A∩B)/N(A)
Where P(A|B) represents the probability of occurrence of A given B has occurred.
N (A ∩ B) is the number of elements common to both A and B.
N(B) is the number of elements in B, and it cannot be equal to zero.
Let N represent the total number of elements in the sample space.
Question 1:
The probability that it is Friday and that a student is absent is 0.03. Since there are 5 school days
in a week, the probability that it is Friday is 0.2. What is the probability that a student is absent
given that today is Friday?
Solution:
The formula of Conditional probability Formula is:
P (B|A) = P (A ∩ B) ⁄P(A)
P (Absent | Friday) = P (Absent and Friday)⁄P(Friday)
= 0.03/0.2
= 0.15
= 15 %
Question: A bag contains 3 black and 5 white balls. Draw the probability tree diagram for two
draws.
Solution: Given:
No. of black balls = 3
No. of white balls = 5
Total Number of balls = 8
So, the probability of getting black balls = 3/8
The probability of getting white balls = 5/8
Therefore, the tree diagram for two draws of balls with possible outcomes and probability is
given as below.
Conditional Probability and Bayes Theorem
Bayes’ theorem defines the probability of occurrence of an event associated with any condition.
It is considered for the case of conditional probability. Also, this is known as the formula for the
likelihood of “causes”.

Properties
Property 1: Let E and F be events of a sample space S of an experiment, then we have:
P(S|F) = P(F|F) = 1.
Property 2: If A and B are any two events of a sample space S and F is an event of S such that
P(F) ≠ 0, then.
P((A ∪ B)|F) = P(A|F) + P(B|F) – P((A ∩ B)|F)
Property 3: P(A′|B) = 1 − P(A|B)

Problems and Solutions


Example 1: Two dies are thrown simultaneously, and the sum of the numbers obtained is found
to be 7. What is the probability that the number 3 has appeared at least once?
Solution: The sample space S would consist of all the numbers possible by the combination of
two dies. Therefore, S consists of 6 × 6, i.e. 36 events.
Event A indicates the combination in which 3 has appeared at least once.
Event B indicates the combination of the numbers which sum up to 7.
A = {(3, 1), (3, 2), (3, 3)(3, 4)(3, 5)(3, 6)(1, 3)(2, 3)(4, 3)(5, 3)(6, 3)}
B = {(1, 6)(2, 5)(3, 4)(4, 3)(5, 2)(6, 1)}
P(A) = 11/36
P(B) = 6/36
A∩B=2
P(A ∩ B) = 2/36
Applying the conditional probability formula we get,
P(A|B) = P(A∩B)/P(B) = (2/36)/(6/36) = ⅓
Bayesian Belief Network in artificial intelligence
Bayesian belief network is key computer technology for dealing with probabilistic
events and to solve a problem which has uncertainty. We can define a Bayesian
network as:

"A Bayesian network is a probabilistic graphical model which represents a set of


variables and their conditional dependencies using a directed acyclic graph."

It is also called a Bayes network, belief network, decision network, or Bayesian


model.

Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and anomaly
detection.

Real world applications are probabilistic in nature, and to represent the relationship
between multiple events, we need a Bayesian network. It can also be used in various
tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.

Bayesian Network can be used for building models from data and experts opinions,
and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.

The generalized form of Bayesian network that represents and solve decision problems
under uncertain knowledge is known as an Influence diagram.

A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node corresponds to the random variables, and a variable can


be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables. These directed links or arrows connect the
pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is
no directed link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by
the nodes of the network graph.
o If we are considering node B, which relates to node A by a directed arrow,
then node A is called the parent of Node B.
Node C is independent of node A.
o
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is
known as a directed acyclic graph or DAG.

The Bayesian network has mainly two components:


o Causal Component
o Actual numbers

Each node in the Bayesian network has condition probability


distribution P(Xi |Parent(Xi) ), which determines the effect of the parent on that node.

Bayesian network is based on Joint probability distribution and conditional probability.


So let's first understand the joint probability distribution:

Joint probability distribution:


If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination
of x1, x2, x3.. xn, are known as Joint probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint
probability distribution.

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

In general, for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Bayesian
Network.pdf
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably
responds at detecting a burglary but also responds for minor earthquakes. Harry has two
neighbors David and Sophia, who have taken a responsibility to inform Harry at work when
they hear the alarm. David always calls Harry when he hears the alarm, but sometimes he got
confused with the phone ringing and calls at that time too. On the other hand, Sophia likes to
listen to high music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.

List of all events occurring in this network:


o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)

We can write the events of problem statement in the form of probability: P[D, S, A, B,
E], can rewrite the above probability statement using joint probability distribution:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]


P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

P(E= False)= 0.999, Which is the probability that an earthquake not occurred.

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:


The Conditional probability of David that he will call depends on the probability of
Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95

Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on its Parent Node
"Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, we can write the problem statement in the form
of probability distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).

= 0.75* 0.91* 0.001* 0.998*0.999

= 0.00068045.

Hence, a Bayesian network can answer any query about the domain by using
Joint distribution.

Rules of Inference in Artificial intelligence


Inference:
In artificial intelligence, we need intelligent computers which can create new logic from
old logic or by evidence, so generating the conclusions from evidence and facts is
termed as Inference.

Inference rules:
Inference rules are the templates for generating valid arguments. Inference rules are
applied to derive proofs in artificial intelligence, and the proof is a sequence of the
conclusion that leads to the desired goal.

In inference rules, the implication among all the connectives plays an important role.
Following are some terminologies related to inference rules:
o Implication: It is one of the logical connectives which can be represented as P → Q. It
is a Boolean expression.
o Converse: The converse of implication, which means the right-hand side proposition
goes to the left-hand side and vice-versa. It can be written as Q → P.
o Contrapositive: The negation of converse is termed as contrapositive, and it can be
represented as ¬ Q → ¬ P.
o Inverse: The negation of implication is called inverse. It can be represented as ¬ P →
¬ Q.

From the above term some of the compound statements are equivalent to each other,
which we can prove using truth table:

Hence from the above truth table, we can prove that P → Q is equivalent to ¬ Q → ¬
P, and Q→ P is equivalent to ¬ P → ¬ Q.

Types of Inference rules:


1. Modus Ponens:
The Modus Ponens rule is one of the most important rules of inference, and it states
that if P and P → Q is true, then we can infer that Q will be true. It can be represented
as:

Proof by Truth table:


2. Modus Tollens:
The Modus Tollens rule state that if P→ Q is true and ¬ Q is true, then ¬ P will also
true. It can be represented as:

Statement-1: "If I am sleepy then I go to bed" ==> P→ Q


Statement-2: "I do not go to the bed."==> ~Q
Statement-3: Which infers that "I am not sleepy" => ~P

Proof by Truth table:

3. Hypothetical Syllogism:
The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is true, and
Q→R is true. It can be represented as the following notation:

Example:

Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R

Proof by truth table:


4. Disjunctive Syllogism:
The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q will be
true. It can be represented as:

Example:

Statement-1: Today is Sunday or Monday. ==>P∨Q


Statement-2: Today is not Sunday. ==> ¬P
Conclusion: Today is Monday. ==> Q

Proof by truth-table:

5. Addition:
The Addition rule is one the common inference rule, and it states that If P is true, then
P∨Q will be true.

Example:
AD

Statement: I have a vanilla ice-cream. ==> P


Statement-2: I have Chocolate ice-cream.
Conclusion: I have vanilla or chocolate ice-cream. ==> (P∨Q)

Proof by Truth-Table:

6. Simplification:
The simplification rule state that if P∧ Q is true, then Q or P will also be true. It can be
represented as:
Proof by Truth-Table:

7. Resolution:
The Resolution rule state that if P∨Q and ¬ P∧R is true, then Q∨R will also be true. It
can be represented as

Proof by Truth-Table:
Hidden Markov Model (HMM)

When we can not observe the state themselves but only the result of
some probability function(observation) of the states we utilize HMM.
HMM is a statistical Markov model in which the system being
modeled is assumed to be a Markov process with unobserved
(hidden) states.

Markov Model: Series of (hidden) states z={z_1,z_2………….} drawn


from state alphabet S ={s_1,s_2,…….𝑠_|𝑆|} where z_i belongs to S.

Hidden Markov Model: Series of observed output x = {x_1,x_2,………}


drawn from an output alphabet V= {𝑣1, 𝑣2, . . , 𝑣_|𝑣|} where x_i
belongs to V

Assumptions of HMM

HMM too is built upon several assumptions and the following is vital.

 Output independence assumption: Output observation is


conditionally independent of all other hidden states and all
other observations when given the current hidden state.

Eq.5.

 Emission Probability Matrix: Probability of hidden state


generating output v_i given that state at the corresponding
time was s_j.

You might also like