Answer-key (15)
Answer-key (15)
S. Marks
N
o
1 Mention different types of Agents? [2M]
Scheme: Listing of 4 types of agents2 Marks
Answer:
1.Simple Reflex agent
2.Model based Agent
3.Goal-based Agent
4.Utility based Agent
2 What is meant by heuristic? [2M]
Scheme:2 marks for each of the following
Answer:
Heuristic: A heuristic function, also known as a heuristic, is a function that
estimates the cost or value associated with reaching a goal state from a given state
in a search problem. The heuristic function is used in heuristic search algorithms
to guide the search toward more promising paths.
5 What is a Convolutional Neural Network and how does it differ from traditional [2M]
neural networks?
Scheme:2 marks for the following
Answer: A convolutional neural network (CNN) is a type of artificial neural
network specifically designed for image recognition and processing, excelling at
identifying patterns in images by leveraging a unique architecture that includes
convolutional layers, which allows it to extract features from an image more
effectively compared to traditional neural networks that process data in a more
generic way; essentially, CNNs are optimized for visual data while traditional
neural networks can be applied to various data types like text or numerical data.
6 Define Markov Decision Process (MDP)? [2M]
Scheme:2 marks for any two of the following
Answer: A Markov Decision Process (MDP) is a mathematical framework used
to model decision-making situations where outcomes are partly random and partly
under the control of a decision-maker. It provides a formal way to model problems
in various domains, such as robotics, economics, healthcare, and artificial
intelligence, particularly reinforcement learning.
7 What do mean by rationality? [2M]
Scheme:2 Marks for the following
Rationality in AI refers to the ability of an artificial intelligence system or agent
to make decisions or take actions that maximize its performance or achieve its
goals based on the knowledge it has about the environment. Rationality is a
cornerstone of intelligent behavior and is central to the design of AI systems.
8 What is uncertainty? [2M]
Scheme:2 marks for the following
Uncertainty in AI refers to situations where an AI system lacks complete or
perfect information about the environment, outcomes, or future events. This
uncertainty arises because real-world environments are often dynamic, noisy, and
complex, making it impossible for the system to know everything or predict
outcomes with absolute certainty.
9 Define max pooling in CNNs. [2M]
Scheme:2 marks for the following
Max pooling is a down-sampling operation commonly used in Convolutional
Neural Networks (CNNs) to reduce the spatial dimensions (height and width) of
feature maps while retaining the most important information. This operation helps
make the model more computationally efficient and robust to small spatial
variations in the input.
10 Mention the significance of experience replay in DQN? [2M]
Scheme: 2 marks for the following
Experience Replay is a key technique used in Deep Q-Learning (DQN) to
improve the stability, efficiency, and performance of training. It involves storing
the agent's past experiences in a replay buffer and sampling mini-batches of these
experiences for training, rather than training on each experience in sequential
order.
S.N PART-B Marks
o
11. a) Explain the properties of Task environment? [5M]
Scheme: Award 3 Marks for listing of any 3 properties of below and 2 Marks
for Explanation of them
Answer:
A task environment in Artificial Intelligence (AI) refers to the setting in which an
intelligent agent operates to achieve a specific goal. Understanding the properties
of a task environment is crucial for designing effective AI systems because these
properties dictate the challenges and constraints the agent faces.
Here are the key properties of a task environment:
1.Observability
2.Deterministic Vs Stochastic
3.Static Vs Dynamic
(3 Marks)
1. Observability
Fully Observable:
o The agent has complete access to all relevant information about the
environment's state at any given time.
o Example: Chess, where the entire board state is visible to both
players.
Partially Observable:
o The agent has limited or noisy access to the environment's state due
to incomplete data or sensor limitations.
o Example: Autonomous driving, where the car might not see around
corners or in poor weather.
2. Deterministic vs. Stochastic
Deterministic:
o The next state of the environment is completely determined by the
current state and the agent's actions.
o Example: Solving a math problem.
Stochastic:
o There is some randomness in the environment's response to an
action, making the outcomes uncertain.
o Example: Poker, where the result depends on hidden cards and other
players' strategies.
3. Static vs. Dynamic
Static:
o The environment does not change while the agent is deciding on an
action.
o Example: Solving a crossword puzzle, where the puzzle remains
unchanged during the process.
Dynamic:
o The environment evolves over time, independent of the agent’s
actions, requiring the agent to act quickly.
o Example: Real-time strategy games or stock market trading.
4. Discrete vs. Continuous
Discrete:
o The environment has a finite set of states, actions, and time steps.
o Example: Board games like chess or tic-tac-toe.
Continuous:
o The environment has continuous states, actions, or time, making it
more complex to model.
o Example: Driving a car, where speed and steering angles can take
continuous values.
5. Single-Agent vs. Multi-Agent
Single-Agent:
o There is only one agent interacting with the environment.
o Example: A robot vacuum cleaner in an empty room.
Multi-Agent:
o Multiple agents operate in the environment, which could involve
cooperation, competition, or both.
o Example: Soccer matches, where multiple agents (players) interact
with each other.
6. Episodic vs. Sequential
Episodic:
o The agent's experiences are divided into distinct episodes, and
actions in one episode do not affect future episodes.
o Example: Classifying images, where each image is an independent
instance.
Sequential:
o Current actions affect future states and rewards, requiring long-term
planning.
o Example: Chess or navigation tasks.
(2 Marks)
b) Can Vacuum cleaner problem be considered as AI problem? Justify your answer with [5M]
suitable discussion?
Scheme: Award 2 Marks for Answer Yes. Award 3 Marks for Justification.
Answer:
Yes, the vacuum cleaner problem can be considered an AI problem, as it involves
designing an intelligent agent to achieve a specific goal—cleaning a predefined area
efficiently—while interacting with its environment. This problem is a classic
example used in AI to demonstrate how agents can operate in different task
environments. (2 Marks)
1. Goal-Oriented Behavior
Objective: The vacuum cleaner's goal is to clean all dirty tiles in the
environment.
The agent needs to plan its actions (move, suck dirt, etc.) to achieve this
goal efficiently.
2. Perception and Action
The agent (vacuum cleaner) perceives the environment (e.g., whether a tile
is clean or dirty) through sensors.
Based on the current perception, the agent performs actions (e.g., moving to
a new tile or cleaning the current tile).
b) Obtain optimal path using minimax search on the following tree/graph structure?
(5 Marks)
13. a) Enumerate the steps in the process of resolution? [5M]
Scheme: Award 1 Mark for defining resolution and 4 Marks for Writing steps.
Answer:
The process of resolution is a fundamental inference technique used in
propositional logic and first-order logic to prove the unsatisfiability of a set of
logical statements or derive conclusions. Below are the key steps involved in the
resolution process: (1 Mark)
Steps for Resolution:
(4 Marks)
b) Explain about inference in first order logic with suitable example? [5M]
Scheme: Award 3 Marks for definition and notations and 2 Marks for
explaining any one of mechanisms given.
Answer:
Inference in First-Order Logic (FOL) refers to the process of deriving new facts
or conclusions from existing knowledge and logical rules. It builds upon
propositional logic by incorporating quantifiers, predicates, functions, and variables,
enabling more expressive representations of knowledge. The goal of inference in
FOL is to determine whether a statement logically follows from a set of premises.
Key Concepts in First-Order Logic
1. Predicates: Represent relationships or properties (e.g.,
Likes(John,IceCream)Likes(John, IceCream)Likes(John,IceCream)).
2. Quantifiers:
o Universal Quantifier (∀\forall∀): Means "for all" (e.g., ∀x
Likes(x,IceCream)\forall x\; Likes(x,
IceCream)∀xLikes(x,IceCream)).
o Existential Quantifier (∃\exists∃): Means "there exists" (e.g., ∃x
Likes(x,IceCream)\exists x\; Likes(x,
IceCream)∃xLikes(x,IceCream)).
3. Terms:
o Constants: Specific entities (e.g., John,MaryJohn, MaryJohn,Mary).
o Variables: Represent general entities (e.g., x,yx, yx,y).
o Functions: Map entities to other entities (e.g.,
Mother(John)Mother(John)Mother(John)). (3 Marks)
Inference Mechanisms in FOL
1. Forward Chaining
Starts with known facts and applies inference rules to derive new facts.
Works well in knowledge bases with Horn clauses (clauses with at most one
positive literal).
Example:
1. Knowledge Base:
o Parent(John,Mary)Parent(John, Mary)Parent(John,Mary)
o Parent(Mary,Alice)Parent(Mary, Alice)Parent(Mary,Alice)
o ∀x,y,z Parent(x,y)∧Parent(y,z) ⟹ Grandparent(x,z)\forall x, y, z\;
Parent(x, y) \land Parent(y, z) \implies Grandparent(x,
z)∀x,y,zParent(x,y)∧Parent(y,z)⟹Grandparent(x,z)
2. Goal: Prove Grandparent(John,Alice)Grandparent(John,
Alice)Grandparent(John,Alice).
3. Process:
o Match Parent(John,Mary)Parent(John, Mary)Parent(John,Mary) and
Parent(Mary,Alice)Parent(Mary, Alice)Parent(Mary,Alice) with the
rule's antecedent.
o Derive Grandparent(John,Alice)Grandparent(John,
Alice)Grandparent(John,Alice).
2. Backward Chaining
Starts with the goal and works backward to determine if it can be derived
from known facts and rules.
Useful for goal-driven reasoning.
Example:
1. Knowledge Base:
o Parent(John,Mary)Parent(John, Mary)Parent(John,Mary)
o Parent(Mary,Alice)Parent(Mary, Alice)Parent(Mary,Alice)
o ∀x,y,z Parent(x,y)∧Parent(y,z) ⟹ Grandparent(x,z)\forall x, y, z\;
Parent(x, y) \land Parent(y, z) \implies Grandparent(x,
z)∀x,y,zParent(x,y)∧Parent(y,z)⟹Grandparent(x,z)
2. Goal: Prove Grandparent(John,Alice)Grandparent(John,
Alice)Grandparent(John,Alice).
3. Process:
o Check if Parent(John,Mary)∧Parent(Mary,Alice)Parent(John, Mary)
\land Parent(Mary, Alice)Parent(John,Mary)∧Parent(Mary,Alice)
hold.
o If true, conclude Grandparent(John,Alice)Grandparent(John,
Alice)Grandparent(John,Alice).
3. Resolution
Used to prove statements by refutation, i.e., showing that the negation of a
statement leads to a contradiction.
(2 Marks)
b) Discuss about policy iteration algorithm for calculating an optimal policy? [5M]
Scheme: Award 3 Marks for Algorithm and 2 Marks for Explanation.
Answer:
(3Marks)
1. Policy Evaluation: Given a policy π, compute its value function Vπ(s), which
represents the expected reward starting from state sss and following π. This
is done iteratively by solving the Bellman equation.
2. Policy Improvement: Update the policy π by making it greedy with respect
to the current value function Vπ(s). The new policy chooses actions that
maximize the expected rewards.
3. These two steps are repeated until the policy converges to the optimal policy
π, where no further improvement is possible.
4. The algorithm guarantees convergence in a finite number of iterations if the
MDP is finite and has a stationary policy.
5. Policy iteration is computationally efficient when the state and action spaces
are small, as it converges faster than value iteration in some cases.
6. It alternates between improving the policy and updating the value estimates,
progressively refining both.
7. The algorithm uses dynamic programming principles and requires knowledge
of the transition probabilities and reward function.
8. Policy iteration ensures that the optimal policy is discovered because each
iteration strictly improves the policy.
9. The stopping criterion is when the policy no longer changes between
iterations, indicating optimality.
10. This algorithm is widely used in reinforcement learning and planning
problems where MDPs are modeled explicitly. (2 Marks)
15. a) Discuss the process and significance of replicating artistic styles using [5M]
convolutional filters?
Scheme: Award 3 Marks for any diagram and 2 Marks for explanation (or)
(Award 5 Marks for entire explanation)
(5 Marks)
(3 Marks)
b) Explain the importance of visualizing learning in convolutional networks and how it [5M]
helps improve model performance.
Scheme:Award 3 Marks for Diagram of CNN and 2 Marks explanation of
importance.
The Convolutional Layer is a fundamental building block in Convolutional Neural
Networks (CNNs), a class of deep learning models designed for processing
structured grid data, such as images. The convolutional layer plays a crucial role in
enabling CNNs to learn hierarchical representations of visual features directly from
raw input data. Here's an explanation of the structure and purpose of the
convolutional layer:
16. a) What is the significance of the explore versus and exploit trade off and explain [5M]
techniques like epsilon-greedy to handle it.
Scheme: Award 3 marks for answer below.
b) Describe the architecture and working of Deep Q-Networks(DQN) and how it [5M]
addresses the scalability problem in Q learning.
Scheme: Award 2 Marks for explanation on deep Q networks below and 2
Marks addressing scalability
Architecture and Working of Deep Q-Networks (DQN)
1. Architecture
The architecture of a DQN typically involves:
Input Layer: Takes the state representation as input. For example, in
environments like Atari games, the input might be raw pixel values of the
game screen.
Convolutional Layers: Used for feature extraction when the state space
consists of image-like data. These layers extract spatial and temporal
features from the input.
Fully Connected Layers: Map the features to the Q-values of all possible
actions in the given state.
Output Layer: Produces a Q-value for each action in the action space,
representing the agent’s estimate of the long-term reward for taking that
action in the current state.
2. Working of DQN
DQN uses a neural network to approximate the Q-value function Q(s,a), which
represents the expected cumulative reward from taking action a in state s and
following the optimal policy thereafter.
Key steps in DQN:
1. Initialize Neural Network: Randomly initialize the weights of the neural
network, which approximates Q(s,a).
2. Experience Replay: The agent stores past experiences (s,a,r,s′) in a replay
buffer to break temporal correlations between consecutive training samples
and stabilize learning.
3. Batch Training: At each training step, a random mini-batch is sampled
from the replay buffer and used to update the network weights.
4. Bellman Equation: The Q-value update is based on the Bellman equation:
Q(s,a)≈r+γ⋅maxa′Q(s′,a′) where γ\gammaγ is the discount factor, and rrr
is the immediate reward.
5. Target Network: To stabilize training, DQN uses a separate target network
Qtarget(s,a), which is periodically updated to match the main network. This
reduces oscillations in Q-value updates. (3 Marks)
3. How DQN Addresses the Scalability Problem in Q-Learning
Traditional Q-learning struggles with scalability in environments with large or
continuous state spaces due to the need to maintain and update a Q-table for all
state-action pairs. DQN addresses this problem by:
1. Function Approximation:
o Instead of maintaining a Q-table, DQN uses a deep neural network
to approximate the Q-function, making it feasible to handle high-
dimensional or continuous state spaces.
2. Experience Replay:
oStores past experiences in a replay buffer and samples them
randomly for training. This improves data efficiency and reduces
overfitting caused by correlated updates.
3. Generalization:
o Neural networks generalize across similar states, enabling the agent
to predict Q-values for unseen states without explicit enumeration.
4. Target Network:
o Stabilizes learning by decoupling the target calculation from the Q-
value updates, reducing divergence during training.
5. Batch Updates:
o Updates weights for multiple transitions in a single step, improving
learning efficiency in complex environments.
(2 Marks)
Answer: A Simple reflex agent, because its decision is based only on the current
percept. Simple reflex agents in AI act solely based on the current percept without
considering past experiences or the environment's future state. They operate using a
set of condition-action rules (e.g., "if condition, then action"). These agents are
suitable for environments that are fully observable and where the correct action can
be determined directly from the percept. However, they struggle in complex or
partially observable environments as they lack memory or reasoning capabilities. An
example is a vacuum cleaner that moves left or right based on whether the current
spot is dirty or clean. (2 Marks)
(2 Marks)
b) Discuss about adversarial search with an example? [3M]
Scheme:Award 2 Marks for Explanation and 1 Mark for example
Adversarial Search in AI
Example: Tic-Tac-Toe
Consider a simple 3x3 Tic-Tac-Toe game where the objective is to align three of
your marks (X or O) in a row, column, or diagonal.
1. Game Tree: The tree starts from the current state and branches into all
possible moves for both players.
o The root node represents the current board state.
o Child nodes represent the possible moves for the maximizer and
minimizer.
2. Minimax Algorithm:
o The maximizer (e.g., Player X) chooses a move that maximizes
their chance of winning.
o The minimizer (e.g., Player O) chooses a move that minimizes
Player X's chance of winning.
Example:
mathematica
Copy code
X|O|
|X|
O| |
Minimax in Action:
The algorithm explores all possible moves for Player X and Player O.
Evaluates states with a utility function:
o Win for X: +1
o Win for O: -1
o Draw: 0
The optimal move is chosen based on the evaluation scores. (1 Mark)
1. Start with Initial Facts: Begin with a set of known facts (data).
2. Apply Rules: Check which rules' conditions are satisfied by the current facts.
3. Infer New Facts: If a rule is satisfied, deduce new facts and add them to the
knowledge base.
4. Repeat: Continue applying rules until the goal is achieved or no new facts can be
inferred. (2Marks)
Utility Function in AI
In a self-driving car:
Utility function evaluates different routes based on factors like:
o Travel time
o Fuel efficiency
o Safety
The car selects the route with the highest utility score.
b) What are the key steps involved in training a convolutional network for the CIFAR- [3M]
10 data set?
Scheme:Award 3 Marks for the following
Answer:
Conclusion
These steps help ensure a structured, efficient process for training a CNN,
optimizing its performance, and preparing it for real-world applications.
(3 Marks)
c) Briefly explain the pole-cart problem and how policy gradients solve it? [3M]
Scheme: Award 2 Marks for explanation of pole cart problem and 1 Mark for
how policy gradients solve it
Answer:
Pole-Cart Problem
The pole-cart problem (or cart-pole problem) is a classic reinforcement learning
(RL) environment where the goal is to balance a pole on a moving cart. The system
includes:
1. Cart: A movable platform on a 1D track.
2. Pole: An upright stick attached to the cart by a pivot.
3. Objective: Prevent the pole from falling over or the cart from going out of
bounds by applying forces (left or right) to the cart.
The agent learns to control the cart to keep the pole balanced for as long as
possible.
Challenges
Continuous Feedback: The pole continuously moves due to gravity and
cart motion, requiring real-time decisions.
Dynamic System: The agent must account for non-linear dynamics like
momentum and inertia.
Delayed Reward: Success (keeping the pole balanced) depends on
sequences of actions.
Policy Gradient Solution
Policy Gradient Methods directly optimize the policy (a probability distribution
over actions) rather than using a value function like Q-learning. Here's how it
solves the pole-cart problem:
1. Define the Policy: The policy is parameterized by a neural network that
outputs probabilities for applying force left or right.
2. Sample Actions: Based on the policy's output, actions are sampled
probabilistically, allowing exploration.
3. Reward Signal: The agent receives a reward at each time step, e.g., +1 for
keeping the pole balanced.
4. Compute Gradients: The gradients of the policy parameters are computed
using the policy gradient theorem, which links rewards to the likelihood of
actions taken.
5. Update Policy: The policy is updated to maximize the cumulative reward,
encouraging actions that lead to long-term stability.
Advantages of Policy Gradients
Handles Continuous Action Spaces: Policy gradients can directly model
continuous actions (e.g., fine-tuning forces on the cart).
Exploration: Stochastic policies allow exploration of diverse strategies.
Scalability: Works well in environments with large state and action spaces.
(3 Marks)
Part - A Max.Marks:20
Max.Marks:50