0% found this document useful (0 votes)
17 views

Answer-key (15)

The document outlines the scheme of evaluation for the B.Tech IV-Year I-Semester External Examinations in Artificial Intelligence and Deep Learning at Sreenidhi Institute of Science and Technology. It includes various questions related to AI concepts, such as types of agents, heuristic functions, and the A* algorithm, along with their corresponding marks. Additionally, it discusses the properties of task environments and the justification for considering the vacuum cleaner problem as an AI problem.

Uploaded by

balaram.balaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Answer-key (15)

The document outlines the scheme of evaluation for the B.Tech IV-Year I-Semester External Examinations in Artificial Intelligence and Deep Learning at Sreenidhi Institute of Science and Technology. It includes various questions related to AI concepts, such as types of agents, heuristic functions, and the A* algorithm, along with their corresponding marks. Additionally, it discusses the properties of task environments and the justification for considering the vacuum cleaner problem as an AI problem.

Uploaded by

balaram.balaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Sreenidhi Institute of Science and Technology

(An Autonomous Institution)


Code No:8EC08 (Scheme of Evaluation) Date:27-12-2023(FN)
B.Tech IV-Year I-Semester External Examinations, December-2024
(Regular and Supplementary)
ARTIFICIAL INTELLIGENCE AND DEEP LEARNING (CSE and IT)
Time: 3 Hours Max. Marks:70
Note: a) No additional answer sheets will be provided.
b) All sub-parts of a question must be answered at one place only, otherwise it will not be valued.
c) Assume any missing data.

Remember L1 Apply L3 Evaluate L5


Understand L2 Analyze L4 Create L6

S. Marks
N
o
1 Mention different types of Agents? [2M]
Scheme: Listing of 4 types of agents2 Marks
Answer:
1.Simple Reflex agent
2.Model based Agent
3.Goal-based Agent
4.Utility based Agent
2 What is meant by heuristic? [2M]
Scheme:2 marks for each of the following
Answer:
Heuristic: A heuristic function, also known as a heuristic, is a function that
estimates the cost or value associated with reaching a goal state from a given state
in a search problem. The heuristic function is used in heuristic search algorithms
to guide the search toward more promising paths.

3 What is knowledge based agent? [2M]


Scheme:2 marks for the following
Answer:
A knowledge-based agent in artificial intelligence (AI) is a system that uses a
knowledge base and logical reasoning to make decisions. Knowledge-based
agents can analyze data, apply rules, and adapt to changing scenarios.
4 Write Bayes Rule? [2M]
Scheme:2 marks for the following
Answer:
Bayes' Rule, also known as Bayes' Theorem or Bayes' Law, is a fundamental
principle in probability theory that describes the relationship between conditional
probabilities. It provides a way to update or revise beliefs about the probability of
an event based on new evidence or information. Bayes' Rule is named after the
Reverend Thomas Bayes, an 18th-century statistician and theologian who
introduced the concept.
The formula for Bayes' Rule is given by:
P(A∣B)=P(B∣A)*P(A)/P(B)
Here's a breakdown of the terms in the formula:
P(A∣B): The probability of event A occurring given that event B has occurred
(posterior probability).
P(B∣A): The probability of event B occurring given that event A has occurred
(likelihood).
P(A): The prior probability of event A, i.e., the probability of A occurring without
considering B.
P(B): The prior probability of event B, i.e., the probability of B occurring without
considering A.

5 What is a Convolutional Neural Network and how does it differ from traditional [2M]
neural networks?
Scheme:2 marks for the following
Answer: A convolutional neural network (CNN) is a type of artificial neural
network specifically designed for image recognition and processing, excelling at
identifying patterns in images by leveraging a unique architecture that includes
convolutional layers, which allows it to extract features from an image more
effectively compared to traditional neural networks that process data in a more
generic way; essentially, CNNs are optimized for visual data while traditional
neural networks can be applied to various data types like text or numerical data.
6 Define Markov Decision Process (MDP)? [2M]
Scheme:2 marks for any two of the following
Answer: A Markov Decision Process (MDP) is a mathematical framework used
to model decision-making situations where outcomes are partly random and partly
under the control of a decision-maker. It provides a formal way to model problems
in various domains, such as robotics, economics, healthcare, and artificial
intelligence, particularly reinforcement learning.
7 What do mean by rationality? [2M]
Scheme:2 Marks for the following
Rationality in AI refers to the ability of an artificial intelligence system or agent
to make decisions or take actions that maximize its performance or achieve its
goals based on the knowledge it has about the environment. Rationality is a
cornerstone of intelligent behavior and is central to the design of AI systems.
8 What is uncertainty? [2M]
Scheme:2 marks for the following
Uncertainty in AI refers to situations where an AI system lacks complete or
perfect information about the environment, outcomes, or future events. This
uncertainty arises because real-world environments are often dynamic, noisy, and
complex, making it impossible for the system to know everything or predict
outcomes with absolute certainty.
9 Define max pooling in CNNs. [2M]
Scheme:2 marks for the following
Max pooling is a down-sampling operation commonly used in Convolutional
Neural Networks (CNNs) to reduce the spatial dimensions (height and width) of
feature maps while retaining the most important information. This operation helps
make the model more computationally efficient and robust to small spatial
variations in the input.
10 Mention the significance of experience replay in DQN? [2M]
Scheme: 2 marks for the following
Experience Replay is a key technique used in Deep Q-Learning (DQN) to
improve the stability, efficiency, and performance of training. It involves storing
the agent's past experiences in a replay buffer and sampling mini-batches of these
experiences for training, rather than training on each experience in sequential
order.
S.N PART-B Marks
o
11. a) Explain the properties of Task environment? [5M]
Scheme: Award 3 Marks for listing of any 3 properties of below and 2 Marks
for Explanation of them
Answer:
A task environment in Artificial Intelligence (AI) refers to the setting in which an
intelligent agent operates to achieve a specific goal. Understanding the properties
of a task environment is crucial for designing effective AI systems because these
properties dictate the challenges and constraints the agent faces.
Here are the key properties of a task environment:
1.Observability
2.Deterministic Vs Stochastic
3.Static Vs Dynamic
(3 Marks)
1. Observability
 Fully Observable:
o The agent has complete access to all relevant information about the
environment's state at any given time.
o Example: Chess, where the entire board state is visible to both
players.
 Partially Observable:
o The agent has limited or noisy access to the environment's state due
to incomplete data or sensor limitations.
o Example: Autonomous driving, where the car might not see around
corners or in poor weather.
2. Deterministic vs. Stochastic
 Deterministic:
o The next state of the environment is completely determined by the
current state and the agent's actions.
o Example: Solving a math problem.
 Stochastic:
o There is some randomness in the environment's response to an
action, making the outcomes uncertain.
o Example: Poker, where the result depends on hidden cards and other
players' strategies.
3. Static vs. Dynamic
 Static:
o The environment does not change while the agent is deciding on an
action.
o Example: Solving a crossword puzzle, where the puzzle remains
unchanged during the process.
 Dynamic:
o The environment evolves over time, independent of the agent’s
actions, requiring the agent to act quickly.
o Example: Real-time strategy games or stock market trading.
4. Discrete vs. Continuous
 Discrete:
o The environment has a finite set of states, actions, and time steps.
o Example: Board games like chess or tic-tac-toe.
 Continuous:
o The environment has continuous states, actions, or time, making it
more complex to model.
o Example: Driving a car, where speed and steering angles can take
continuous values.
5. Single-Agent vs. Multi-Agent
 Single-Agent:
o There is only one agent interacting with the environment.
o Example: A robot vacuum cleaner in an empty room.
 Multi-Agent:
o Multiple agents operate in the environment, which could involve
cooperation, competition, or both.
o Example: Soccer matches, where multiple agents (players) interact
with each other.
6. Episodic vs. Sequential
 Episodic:
o The agent's experiences are divided into distinct episodes, and
actions in one episode do not affect future episodes.
o Example: Classifying images, where each image is an independent
instance.
 Sequential:
o Current actions affect future states and rewards, requiring long-term
planning.
o Example: Chess or navigation tasks.

(2 Marks)

b) Can Vacuum cleaner problem be considered as AI problem? Justify your answer with [5M]
suitable discussion?
Scheme: Award 2 Marks for Answer Yes. Award 3 Marks for Justification.
Answer:
Yes, the vacuum cleaner problem can be considered an AI problem, as it involves
designing an intelligent agent to achieve a specific goal—cleaning a predefined area
efficiently—while interacting with its environment. This problem is a classic
example used in AI to demonstrate how agents can operate in different task
environments. (2 Marks)

Justification: The vacuum cleaner problem satisfies key aspects of an AI problem:

1. Goal-Oriented Behavior

 Objective: The vacuum cleaner's goal is to clean all dirty tiles in the
environment.
 The agent needs to plan its actions (move, suck dirt, etc.) to achieve this
goal efficiently.
2. Perception and Action

 The agent (vacuum cleaner) perceives the environment (e.g., whether a tile
is clean or dirty) through sensors.
 Based on the current perception, the agent performs actions (e.g., moving to
a new tile or cleaning the current tile).

3. Interaction with the Environment

 The agent interacts dynamically with its environment. For example:


o The environment state changes when the agent cleans a dirty tile.
o The agent's actions (movement) depend on the environment's
structure (e.g., walls or obstacles).

4. Task Environment Properties

The vacuum cleaner problem demonstrates several AI-relevant environment


properties:

 Fully Observable or Partially Observable:


o If the agent can sense the entire environment at all times, it is fully
observable.
o If the agent can only sense its current location, it is partially
observable.
 Deterministic or Stochastic:
o In a deterministic environment, the agent's actions always produce
the expected results (e.g., moving left moves it to the adjacent tile).
o In a stochastic environment, there may be uncertainties, such as the
vacuum cleaner failing to clean properly.
 Static or Dynamic:
o If the environment doesn't change while the agent operates, it is
static.
o If dirt appears randomly or new obstacles arise, it is dynamic.
 Discrete or Continuous:
o A grid-like floor is a discrete environment (specific tiles).
o A room where the vacuum cleaner can move in any direction is
continuous. (3 Marks)

12. a) Discuss in detail about A* algorithm by using a suitable example? [5M]


Scheme: Award 3 Marks for algorithm and explanation and 2 Marks for
example problem.
Answer:
A* search is the most commonly known form of best-first search. It uses heuristic
function h(n), and cost to reach the node n from the start state g(n). It has combined
features of UCS and greedy best-first search, by which it solve the problem
efficiently. A* search algorithm finds the shortest path through the search space
using the heuristic function. This search algorithm expands less search tree and
provides optimal result faster. A* algorithm is similar to UCS except that it uses
g(n)+h(n) instead of g(n).
Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure
and stops.
Step 3: Select the node from the OPEN list which has the smallest value of
evaluation function (g+h), if node n is goal node then return success and stop,
otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed
list. For each successor n', check whether n' is already in the OPEN or CLOSED list,
if not then compute evaluation function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached
to the back pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2. (3 Marks)

Initialization: {(S, 5)}


Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G,
10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal
path with cost 6.
(2 Marks)

b) Obtain optimal path using minimax search on the following tree/graph structure?
(5 Marks)
13. a) Enumerate the steps in the process of resolution? [5M]
Scheme: Award 1 Mark for defining resolution and 4 Marks for Writing steps.
Answer:
The process of resolution is a fundamental inference technique used in
propositional logic and first-order logic to prove the unsatisfiability of a set of
logical statements or derive conclusions. Below are the key steps involved in the
resolution process: (1 Mark)
Steps for Resolution:

1. Conversion of facts into first-order logic.


2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).

(4 Marks)
b) Explain about inference in first order logic with suitable example? [5M]

Scheme: Award 3 Marks for definition and notations and 2 Marks for
explaining any one of mechanisms given.
Answer:
Inference in First-Order Logic (FOL) refers to the process of deriving new facts
or conclusions from existing knowledge and logical rules. It builds upon
propositional logic by incorporating quantifiers, predicates, functions, and variables,
enabling more expressive representations of knowledge. The goal of inference in
FOL is to determine whether a statement logically follows from a set of premises.
Key Concepts in First-Order Logic
1. Predicates: Represent relationships or properties (e.g.,
Likes(John,IceCream)Likes(John, IceCream)Likes(John,IceCream)).
2. Quantifiers:
o Universal Quantifier (∀\forall∀): Means "for all" (e.g., ∀x
Likes(x,IceCream)\forall x\; Likes(x,
IceCream)∀xLikes(x,IceCream)).
o Existential Quantifier (∃\exists∃): Means "there exists" (e.g., ∃x
Likes(x,IceCream)\exists x\; Likes(x,
IceCream)∃xLikes(x,IceCream)).
3. Terms:
o Constants: Specific entities (e.g., John,MaryJohn, MaryJohn,Mary).
o Variables: Represent general entities (e.g., x,yx, yx,y).
o Functions: Map entities to other entities (e.g.,
Mother(John)Mother(John)Mother(John)). (3 Marks)
Inference Mechanisms in FOL
1. Forward Chaining
 Starts with known facts and applies inference rules to derive new facts.
 Works well in knowledge bases with Horn clauses (clauses with at most one
positive literal).
Example:
1. Knowledge Base:
o Parent(John,Mary)Parent(John, Mary)Parent(John,Mary)
o Parent(Mary,Alice)Parent(Mary, Alice)Parent(Mary,Alice)
o ∀x,y,z Parent(x,y)∧Parent(y,z) ⟹ Grandparent(x,z)\forall x, y, z\;
Parent(x, y) \land Parent(y, z) \implies Grandparent(x,
z)∀x,y,zParent(x,y)∧Parent(y,z)⟹Grandparent(x,z)
2. Goal: Prove Grandparent(John,Alice)Grandparent(John,
Alice)Grandparent(John,Alice).
3. Process:
o Match Parent(John,Mary)Parent(John, Mary)Parent(John,Mary) and
Parent(Mary,Alice)Parent(Mary, Alice)Parent(Mary,Alice) with the
rule's antecedent.
o Derive Grandparent(John,Alice)Grandparent(John,
Alice)Grandparent(John,Alice).
2. Backward Chaining
 Starts with the goal and works backward to determine if it can be derived
from known facts and rules.
 Useful for goal-driven reasoning.
Example:
1. Knowledge Base:
o Parent(John,Mary)Parent(John, Mary)Parent(John,Mary)
o Parent(Mary,Alice)Parent(Mary, Alice)Parent(Mary,Alice)
o ∀x,y,z Parent(x,y)∧Parent(y,z) ⟹ Grandparent(x,z)\forall x, y, z\;
Parent(x, y) \land Parent(y, z) \implies Grandparent(x,
z)∀x,y,zParent(x,y)∧Parent(y,z)⟹Grandparent(x,z)
2. Goal: Prove Grandparent(John,Alice)Grandparent(John,
Alice)Grandparent(John,Alice).
3. Process:
o Check if Parent(John,Mary)∧Parent(Mary,Alice)Parent(John, Mary)
\land Parent(Mary, Alice)Parent(John,Mary)∧Parent(Mary,Alice)
hold.
o If true, conclude Grandparent(John,Alice)Grandparent(John,
Alice)Grandparent(John,Alice).
3. Resolution
 Used to prove statements by refutation, i.e., showing that the negation of a
statement leads to a contradiction.

(2 Marks)

14. a) Explain the following (i) Definition of POMDP. [5M]


Scheme: Award 3 Marks for definition of POMDP
Answer: POMDP:In a partially observable Markov Decision Process (POMDP),
the agent does not have complete information about the current state of the
environment.
• Unlike a standard Markov Decision Process (MDP), where the agent has
full observability of the state, in a POMDP, the agent receives partial or
incomplete information, typically through observations. (3 Marks)

ii) Decision Cycle of a POMDP agent?


Scheme: Award 2 Marks for the following.
The decision cycle of a POMDP (Partially Observable Markov Decision
Process) agent outlines the steps the agent follows to make decisions in an uncertain
environment where the full state of the system is not directly observable. A POMDP
extends a standard MDP by incorporating partial observability, making it a powerful
framework for real-world problems.
Steps in the Decision Cycle of a POMDP Agent
1. Belief State Update:
o Since the agent cannot directly observe the true state of the
environment, it maintains a belief state b(s), which is a probability
distribution over all possible states s.
 After taking an action and receiving an observation, the
belief state is updated using Bayesian inference to account
for the new information
2. Policy Selection:
o The agent uses a policy π(b) to decide on the next action a based on
the current belief state b.
o A policy is a mapping from belief states to actions: π(b)=a
o The goal is to choose the action that maximizes the expected reward
over the future belief states, considering the uncertainty and
observations.
3. Action Execution:
o The agent executes the selected action a in the environment.
o This action affects the true (hidden) state of the system according to
the transition model T(s′∣s,a).
4. Observation Reception:
o After executing the action, the agent receives an observation o from
the environment.
o Observations provide indirect information about the current state of
the system.
5. Reward Collection:
o The agent receives an immediate reward R(s,a) based on the action
aaa taken and the true (hidden) state s.
o This reward is used to guide the agent's decisions and evaluate the
effectiveness of its policy.
6. Belief State Update (Repeat):
o Using the new observation o, the agent updates its belief state b′(s′)
as described in Step 1.
o The updated belief state becomes the basis for the next decision
cycle. (2Marks)

b) Discuss about policy iteration algorithm for calculating an optimal policy? [5M]
Scheme: Award 3 Marks for Algorithm and 2 Marks for Explanation.
Answer:
(3Marks)

Policy Iteration is an iterative algorithm used to solve Markov Decision Processes


(MDPs) by finding the optimal policy that maximizes the expected rewards over
time. It alternates between two main steps: policy evaluation and policy
improvement.

1. Policy Evaluation: Given a policy π, compute its value function Vπ(s), which
represents the expected reward starting from state sss and following π. This
is done iteratively by solving the Bellman equation.
2. Policy Improvement: Update the policy π by making it greedy with respect
to the current value function Vπ(s). The new policy chooses actions that
maximize the expected rewards.
3. These two steps are repeated until the policy converges to the optimal policy
π, where no further improvement is possible.
4. The algorithm guarantees convergence in a finite number of iterations if the
MDP is finite and has a stationary policy.
5. Policy iteration is computationally efficient when the state and action spaces
are small, as it converges faster than value iteration in some cases.
6. It alternates between improving the policy and updating the value estimates,
progressively refining both.
7. The algorithm uses dynamic programming principles and requires knowledge
of the transition probabilities and reward function.
8. Policy iteration ensures that the optimal policy is discovered because each
iteration strictly improves the policy.
9. The stopping criterion is when the policy no longer changes between
iterations, indicating optimality.
10. This algorithm is widely used in reinforcement learning and planning
problems where MDPs are modeled explicitly. (2 Marks)

15. a) Discuss the process and significance of replicating artistic styles using [5M]
convolutional filters?
Scheme: Award 3 Marks for any diagram and 2 Marks for explanation (or)
(Award 5 Marks for entire explanation)

• The goal of neural style is to be able to take an arbitrary photograph and


rerender it as if it were painted in .
• clever manipulation of convolutional filters can produce spectacular results
on this problem.

(5 Marks)
(3 Marks)
b) Explain the importance of visualizing learning in convolutional networks and how it [5M]
helps improve model performance.
Scheme:Award 3 Marks for Diagram of CNN and 2 Marks explanation of
importance.
The Convolutional Layer is a fundamental building block in Convolutional Neural
Networks (CNNs), a class of deep learning models designed for processing
structured grid data, such as images. The convolutional layer plays a crucial role in
enabling CNNs to learn hierarchical representations of visual features directly from
raw input data. Here's an explanation of the structure and purpose of the
convolutional layer:

Structure of a Convolutional Layer:


Filters (Kernels):
The convolutional layer is composed of a set of learnable filters (also known as
kernels or convolutional kernels). Each filter is a small, spatially local receptive field
that slides over the input data.
Local Receptive Fields:
Filters are applied to local receptive fields in the input. The size of these receptive
fields is determined by the dimensions of the filters. The filters slide (or convolve)
across the input, extracting local features at each position.
Convolution Operation:
The convolution operation involves taking the dot product between the filter and the
values in the local receptive field. This operation is applied at every spatial position
in the input, producing an output feature map.
Activation Function:
Each element in the resulting feature map goes through an activation function (e.g.,
ReLU) to introduce non-linearity. This enables the network to learn complex and
hierarchical representations.
Strides:
Stride refers to the step size with which the filter moves across the input. It
determines how much the filter shifts during each application. Strides influence the
spatial dimensions of the output feature map.
Padding:
Padding involves adding extra border pixels around the input. It helps preserve the
spatial dimensions of the input and prevents a reduction in size after convolution.
Common padding values include 'valid' (no padding) and 'same' (padding to keep
output size the same as the input).
How it improves model performance:

Visualizing learning in Convolutional Neural Networks (CNNs) is crucial for


understanding how models process data and extract features. It helps interpret the
network's focus at different layers, revealing whether it captures meaningful patterns
or gets distracted by irrelevant details. Visualization tools, like Grad-CAM or
saliency maps, aid in debugging and detecting overfitting or underfitting. They also
help optimize network architectures by identifying redundant layers or bottlenecks.
By analyzing activations and filters, we can refine data preprocessing and
augmentation strategies, ensuring better generalization. Visualizations enhance trust
and interpretability, especially in critical domains like medical imaging, where
reasoning must be explainable. They also uncover dataset biases, guiding data
balancing efforts. For transfer learning, they show how well pre-trained features
apply to new tasks. Overall, visualizations improve performance, efficiency, and
transparency in CNN models.

16. a) What is the significance of the explore versus and exploit trade off and explain [5M]
techniques like epsilon-greedy to handle it.
Scheme: Award 3 marks for answer below.

Answer: The explore-exploit trade-off is a fundamental concept in reinforcement


learning and decision-making under uncertainty. It refers to the challenge of
balancing:

1. Exploration: Trying new actions to gather more information about the


environment and discover potentially better options (e.g., higher rewards).
2. Exploitation: Leveraging existing knowledge to select the action that is
currently estimated to provide the highest reward.

Finding the right balance is critical because:


 Too much exploration can result in suboptimal performance due to
spending excessive time on actions that may not yield rewards.
 Too much exploitation can cause the agent to miss opportunities to learn
about better actions, leading to local optima instead of the global optimum.
(3 Marks)
Techniques to Handle the Trade-Off
1. Epsilon-Greedy Strategy:
o This is one of the simplest methods for balancing exploration and
exploitation.
o The agent chooses a random action (exploration) with probability
ϵ\epsilonϵ and the action with the highest expected reward
(exploitation) with probability 1−ϵ1 - \epsilon1−ϵ.
o ϵ\epsilonϵ is typically decreased over time to allow more exploration
early in learning and more exploitation as the agent gains confidence
in its estimates.
Example:
o Initially, ϵ=0.1\epsilon = 0.1ϵ=0.1: the agent explores 10% of the
time and exploits 90% of the time.
o Gradually, ϵ\epsilonϵ might decay to 0.01, reducing exploration to
1%. (2 Marks)

b) Describe the architecture and working of Deep Q-Networks(DQN) and how it [5M]
addresses the scalability problem in Q learning.
Scheme: Award 2 Marks for explanation on deep Q networks below and 2
Marks addressing scalability
Architecture and Working of Deep Q-Networks (DQN)

Deep Q-Networks (DQN) combine Q-Learning, a popular reinforcement learning


algorithm, with deep neural networks to address problems with high-dimensional
state spaces. Developed by DeepMind in 2013, DQN revolutionized reinforcement
learning by enabling agents to learn from raw sensory inputs like images.

1. Architecture
The architecture of a DQN typically involves:
 Input Layer: Takes the state representation as input. For example, in
environments like Atari games, the input might be raw pixel values of the
game screen.
 Convolutional Layers: Used for feature extraction when the state space
consists of image-like data. These layers extract spatial and temporal
features from the input.
 Fully Connected Layers: Map the features to the Q-values of all possible
actions in the given state.
 Output Layer: Produces a Q-value for each action in the action space,
representing the agent’s estimate of the long-term reward for taking that
action in the current state.
2. Working of DQN
DQN uses a neural network to approximate the Q-value function Q(s,a), which
represents the expected cumulative reward from taking action a in state s and
following the optimal policy thereafter.
Key steps in DQN:
1. Initialize Neural Network: Randomly initialize the weights of the neural
network, which approximates Q(s,a).
2. Experience Replay: The agent stores past experiences (s,a,r,s′) in a replay
buffer to break temporal correlations between consecutive training samples
and stabilize learning.
3. Batch Training: At each training step, a random mini-batch is sampled
from the replay buffer and used to update the network weights.
4. Bellman Equation: The Q-value update is based on the Bellman equation:
Q(s,a)≈r+γ⋅max⁡a′Q(s′,a′) where γ\gammaγ is the discount factor, and rrr
is the immediate reward.
5. Target Network: To stabilize training, DQN uses a separate target network
Qtarget(s,a), which is periodically updated to match the main network. This
reduces oscillations in Q-value updates. (3 Marks)
3. How DQN Addresses the Scalability Problem in Q-Learning
Traditional Q-learning struggles with scalability in environments with large or
continuous state spaces due to the need to maintain and update a Q-table for all
state-action pairs. DQN addresses this problem by:
1. Function Approximation:
o Instead of maintaining a Q-table, DQN uses a deep neural network
to approximate the Q-function, making it feasible to handle high-
dimensional or continuous state spaces.
2. Experience Replay:
oStores past experiences in a replay buffer and samples them
randomly for training. This improves data efficiency and reduces
overfitting caused by correlated updates.
3. Generalization:
o Neural networks generalize across similar states, enabling the agent
to predict Q-values for unseen states without explicit enumeration.
4. Target Network:
o Stabilizes learning by decoupling the target calculation from the Q-
value updates, reducing divergence during training.
5. Batch Updates:
o Updates weights for multiple transitions in a single step, improving
learning efficiency in complex environments.
(2 Marks)

17. a) Illustrate simple reflex agent with neat diagram? [4M]


Scheme: Award 2 Marks for Diagram and 2 Marks for explanation

Answer: A Simple reflex agent, because its decision is based only on the current
percept. Simple reflex agents in AI act solely based on the current percept without
considering past experiences or the environment's future state. They operate using a
set of condition-action rules (e.g., "if condition, then action"). These agents are
suitable for environments that are fully observable and where the correct action can
be determined directly from the percept. However, they struggle in complex or
partially observable environments as they lack memory or reasoning capabilities. An
example is a vacuum cleaner that moves left or right based on whether the current
spot is dirty or clean. (2 Marks)
(2 Marks)
b) Discuss about adversarial search with an example? [3M]
Scheme:Award 2 Marks for Explanation and 1 Mark for example

Adversarial Search in AI

Adversarial search is a type of search used in decision-making for competitive


environments where agents compete against each other. It applies to games like
chess, checkers, or tic-tac-toe, where players' actions directly impact the others'
outcomes. The objective of adversarial search is to determine the optimal strategy
for an agent, assuming the opponent also plays optimally.

Key Features of Adversarial Search

1. Multi-agent Environment: It involves two or more agents with opposing


goals (e.g., maximizing vs. minimizing scores).
2. Minimax Algorithm: A common approach used to find the best move by
evaluating all possible moves and their outcomes.
o Maximizer: Tries to maximize its payoff (e.g., player in chess).
o Minimizer: Tries to minimize the maximizer’s payoff (e.g.,
opponent in chess).
3. Evaluation Function: Used to estimate the utility of a game state when the
search tree is too large to fully explore.
(2 Marks)

Example: Tic-Tac-Toe

Consider a simple 3x3 Tic-Tac-Toe game where the objective is to align three of
your marks (X or O) in a row, column, or diagonal.

Steps in Adversarial Search:

1. Game Tree: The tree starts from the current state and branches into all
possible moves for both players.
o The root node represents the current board state.
o Child nodes represent the possible moves for the maximizer and
minimizer.
2. Minimax Algorithm:
o The maximizer (e.g., Player X) chooses a move that maximizes
their chance of winning.
o The minimizer (e.g., Player O) chooses a move that minimizes
Player X's chance of winning.

Example:

 Initial State: Player X's turn, board looks like this:

mathematica
Copy code
X|O|
|X|
O| |

 Maximizer’s Goal: Place X in the bottom-right to form a diagonal win.


 Minimizer’s Response: Block Player X by placing O in the bottom-right.

Minimax in Action:

 The algorithm explores all possible moves for Player X and Player O.
 Evaluates states with a utility function:
o Win for X: +1
o Win for O: -1
o Draw: 0
 The optimal move is chosen based on the evaluation scores. (1 Mark)

c) Discuss about forward chaining? [3M]


Scheme: Award 2 Marks for Explanation and 1 Mark for example
Forward Chaining in AI

Forward chaining is a reasoning technique used in rule-based systems to infer


conclusions from a set of known facts and rules. It is a data-driven approach,
meaning it starts from known facts, applies inference rules, and iteratively deduces
new facts until a goal is reached or no further inference can be made.

Steps in Forward Chaining

1. Start with Initial Facts: Begin with a set of known facts (data).
2. Apply Rules: Check which rules' conditions are satisfied by the current facts.
3. Infer New Facts: If a rule is satisfied, deduce new facts and add them to the
knowledge base.
4. Repeat: Continue applying rules until the goal is achieved or no new facts can be
inferred. (2Marks)

Example: Diagnosing a Fever

Knowledge Base (Rules):

1. Rule 1: If a person has a high temperature, then they have a fever.


If has_high_temperature(X)⇒has_fever(X)..
2. Rule 2: If a person has a fever and body aches, they may have the flu.
If has_fever(X) AND has_body_aches(X)⇒may_have_flu(X).\text{If }
3. Rule 3: If a person has a fever and a rash, they may have measles.
If has_fever(X) AND has_rash(X)⇒may_have_measles(X)
4. Initial Facts:

 John has a high temperature: has_high_temperature(John)


 John has body aches: has_body_aches(John)

Forward Chaining Process:

1. Step 1: Start with has_high_temperature(John) and


2. Step 2: Apply Rule 1:
o has_high_temperature(John)-> Deduce has_fever(John).
o .
3. Step 3: Apply Rule 2:
o has_fever(John) AND has_body_aches(John) → Deduce
may_have_flu(John).
4. Step 4: Check for other rules:
o No further deductions can be made as no rash is observed, so Rule 3 is
not triggered. (1Mark)

18. a) Briefly discuss about utility function? [4M]


Scheme: Award 4 Marks for the following or any relevant answer.

Utility Function in AI

A utility function in AI is a mathematical representation used to measure the


desirability or value of an outcome. It quantifies how preferable one state or action
is compared to others, guiding an agent to make decisions that maximize its overall
utility.

Key Features of Utility Functions

1. Quantifies Preferences: Assigns numerical values to different outcomes or


states.
2. Guides Decision-Making: Helps agents choose actions that lead to the
most desirable outcomes.
3. Handles Uncertainty: Often used in probabilistic reasoning, where the
utility of uncertain outcomes is combined with their probabilities (expected
utility).
4. Consistency: Ensures rationality in decision-making by always selecting
the action with the highest utility.

Example of a Utility Function

In a self-driving car:
Utility function evaluates different routes based on factors like:

o Travel time
o Fuel efficiency
o Safety
 The car selects the route with the highest utility score.

b) What are the key steps involved in training a convolutional network for the CIFAR- [3M]
10 data set?
Scheme:Award 3 Marks for the following
Answer:

Key Steps to Train a CNN on CIFAR-10 Dataset

1. Load and Preprocess Data:


Load the CIFAR-10 dataset and preprocess it by normalizing pixel values,
one-hot encoding labels, and applying data augmentation to increase
variability.
2. Define CNN Architecture:
Create a CNN with convolutional layers for feature extraction, pooling
layers for dimensionality reduction, and fully connected layers for
classification.
3. Compile the Model:
Configure the model by selecting a suitable loss function (e.g., cross-
entropy), optimizer (e.g., Adam or SGD), and evaluation metrics (e.g.,
accuracy).
4. Train the Model:
Train the CNN on the training set using mini-batches, specifying the
number of epochs, and validating the performance on a validation set.
5. Evaluate the Model:
Test the trained model on unseen test data to measure its accuracy and other
performance metrics like precision, recall, or F1-score.
6. Fine-Tune the Model:
Adjust hyperparameters, apply regularization techniques like dropout, and
further augment the data to improve generalization.
7. Visualize Results:
Plot training and validation accuracy/loss curves and analyze
misclassifications to identify potential areas of improvement.
8. Save the Model:
Save the trained model for future use in deployment or further training.

Conclusion

These steps help ensure a structured, efficient process for training a CNN,
optimizing its performance, and preparing it for real-world applications.

(3 Marks)
c) Briefly explain the pole-cart problem and how policy gradients solve it? [3M]
Scheme: Award 2 Marks for explanation of pole cart problem and 1 Mark for
how policy gradients solve it
Answer:
Pole-Cart Problem
The pole-cart problem (or cart-pole problem) is a classic reinforcement learning
(RL) environment where the goal is to balance a pole on a moving cart. The system
includes:
1. Cart: A movable platform on a 1D track.
2. Pole: An upright stick attached to the cart by a pivot.
3. Objective: Prevent the pole from falling over or the cart from going out of
bounds by applying forces (left or right) to the cart.
The agent learns to control the cart to keep the pole balanced for as long as
possible.
Challenges
 Continuous Feedback: The pole continuously moves due to gravity and
cart motion, requiring real-time decisions.
 Dynamic System: The agent must account for non-linear dynamics like
momentum and inertia.
 Delayed Reward: Success (keeping the pole balanced) depends on
sequences of actions.
Policy Gradient Solution
Policy Gradient Methods directly optimize the policy (a probability distribution
over actions) rather than using a value function like Q-learning. Here's how it
solves the pole-cart problem:
1. Define the Policy: The policy is parameterized by a neural network that
outputs probabilities for applying force left or right.
2. Sample Actions: Based on the policy's output, actions are sampled
probabilistically, allowing exploration.
3. Reward Signal: The agent receives a reward at each time step, e.g., +1 for
keeping the pole balanced.
4. Compute Gradients: The gradients of the policy parameters are computed
using the policy gradient theorem, which links rewards to the likelihood of
actions taken.
5. Update Policy: The policy is updated to maximize the cumulative reward,
encouraging actions that lead to long-term stability.
Advantages of Policy Gradients
 Handles Continuous Action Spaces: Policy gradients can directly model
continuous actions (e.g., fine-tuning forces on the cart).
 Exploration: Stochastic policies allow exploration of diverse strategies.
 Scalability: Works well in environments with large state and action spaces.
(3 Marks)

Part - A Max.Marks:20

Max.Marks:50

You might also like