Artificial Intelligence and Machine Learning Question Bank
Artificial Intelligence and Machine Learning Question Bank
in
COURSE OBJECTIVES:
The main objectives of this course are to:
Study about uninformed and Heuristic search techniques.
Learn techniques for reasoning under uncertainty
Introduce Machine Learning and supervised learning algorithms
Study about ensembling and unsupervised learning algorithms
Learn the basics of deep learning using neural networks
n
UNIT I PROBLEM SOLVING
e.i
Introduction to AI - AI Applications - Problem solving agents – search algorithms –
uninformed search strategies – Heuristic search strategies – Local search and optimization
problems – adversarial search – constraint satisfaction problems (CSP)
e
Acting under uncertainty – Bayesian inference – naïve bayes models. Probabilistic
reasoning – Bayesian networks – exact inference in BN – approximate inference in BN –
sfr
causal networks.
PRACTICAL EXERCISES:
1. Implementation of Uninformed search algorithms (BFS, DFS)
2. Implementation of Informed search algorithms (A*, memory-bounded A*)
3. Implement naïve Bayes models
4. Implement Bayesian Networks
5. Build Regression models
6. Build decision trees and random forests
7. Build SVM models
8. Implement ensembling techniques
9. Implement clustering algorithms
www.Notesfree.in
www.Notesfree.in
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Use appropriate search algorithms for problem solving
CO2: Apply reasoning under uncertainty
CO3: Build supervised learning models
CO4: Build ensembling and unsupervised models
n
CO5: Build deep learning neural network models
e.i
TEXT BOOKS:
1. Stuart Russell and Peter Norvig, “Artificial Intelligence – A Modern Approach”, Fourth
Edition, Pearson Education, 2021.
2. Ethem Alpaydin, “Introduction to Machine Learning”, MIT Press, Fourth Edition, 2020.
REFERENCES:
e
sfr
1. Dan W. Patterson, “Introduction to Artificial Intelligence and Expert Systems”, Pearson
Education,2007
2. Kevin Night, Elaine Rich, and Nair B., “Artificial Intelligence”, McGraw Hill, 2008
3. Patrick H. Winston, "Artificial Intelligence", Third Edition, Pearson Education, 2006
4. Deepak Khemani, “Artificial Intelligence”, Tata McGraw Hill Education, 2013
ote
(http://nptel.ac.in/)
5. Christopher M. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006.
6. Tom Mitchell, “Machine Learning”, McGraw Hill, 3rd Edition,1997.
7. Charu C. Aggarwal, “Data Classification Algorithms and Applications”, CRC Press, 2014
8. Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar, “Foundations of Machine
Learning”, MIT Press, 2012.
.N
9. Ian Goodfellow, Yoshua Bengio, Aaron Courville, “Deep Learning”, MIT Press, 2016
w
ww
www.Notesfree.in
www.Notesfree.in
Part A
n
2. What is an agent?
e.i
An agent is anything that can be viewed as perceiving its environment through
sensors and acting upon that environment through actuators.
e
A human agent has eyes, ears, and other organs for sensors and hands, legs,
mouth, and other body parts for actuators.
sfr
A robotic agent might have cameras and infrared range finders for sensors
and various motors for actuators.
A software agent receives keystrokes, file contents, and network packets as
sensory inputs and acts on the environment by displaying on the screen, writing files,
ote
For each possible percept sequence, a rational agent should select an action that is
expected to maximize its performance measure, given the evidence provided by the
percept sequence and whatever built-in knowledge the agent has. A rational agent
should be autonomous.
w
www.Notesfree.in
www.Notesfree.in
n
Robotic vehicles
Speech recognition
e.i
Autonomous planning and scheduling
Game playing
Spam fighting
Logistics planning
Robotics
Machine Translation
e
sfr
7. Are reflex actions (such as flinching from a hot stove) rational? Are they
intelligent?
Reflex actions can be considered rational. If the body is performing the action,
ote
then it can be argued that reflex actions are rational because of evolutionary
adaptation. Flinching from a hot stove is a normal reaction, because the body wants to
keep itself out of danger and getting away from something hot is a way to do that.
Reflex actions are also intelligent. Intelligence suggests that there is reasoning
.N
www.Notesfree.in
www.Notesfree.in
n
Problem formulation is the process of deciding what actions and states to
consider given a goal.
e.i
12. List the steps involved in simple problem solving agent.
(i) Goal formulation
(ii) Problem formulation
(iii) Search
e
(iv) Search Algorithm
(v) Execution phase
sfr
13. What are the components of well-defined problems? (or)
What are the four components to define a problem? Define them?
The four components to define a problem are,
1) Initial state – it is the state in which agent starts in.
ote
to each path.
The problem-solving agent is expected to choose a cost function that reflects
its own performance measure.
w
real world problem is one whose solutions people actually care about.
www.Notesfree.in
www.Notesfree.in
Automatic Assembly
Internet searching
Toy problem Examples:
8 – Queen problem
8 – Puzzle problem
Vacuum world problem
n
Completeness: Is the algorithm guaranteed to find a solution when there is
one?
Optimality: Does the strategy find the optimal solution?
e.i
Time complexity: How long does it take to find a solution?
Space complexity: How much memory is needed to perform the search?
e
It is simple search strategy, which is complete i.e. it surely gives solution if
solution exists. If the depth of search tree is small then BFS is the best choice. It is
sfr
useful in tree as well as in graph search.
1) How is the state space – That is, state space is tree structured or graph?
Critical factor for state space is what is branching factor and depth level of that tree or
graph.
2) What is the performance of the search strategy? A complete, optimal search
strategy with better time and space requirement is critical factor in performance of
.N
search strategy.
www.Notesfree.in
www.Notesfree.in
Depth-limited search
Iterative deepening depth-first search
Bidirectional search
21. What is the power of heuristic search? (or) Why does one go for heuristics
search?
Heuristic search uses problem specific knowledge while searching in state
space. This helps to improve average search performance. They use evaluation
functions which denote relative desirability (goodness) of a expanding node set. This
makes the search more efficient and faster. One should go for heuristic search because
n
it has power to solve large, hard problems in affordable times.
e.i
22. What are the advantages of heuristic function?
Heuristics function ranks alternative paths in various search algorithms, at
each branching step, based on the available information, so that a better path is
chosen. The main advantage of heuristic function is that it guides for which state to
e
explore now, while searching. It makes use of problem specific knowledge like
constraints to check the goodness of a state to be explained. This drastically reduces
sfr
the required searching time.
23. State the reason when hill climbing often gets stuck?
Local maxima are the state where hill climbing algorithm is sure to get struck.
ote
Local maxima are the peak that is higher than each of its neighbour states, but lower
than the global maximum. So we have missed the better state here. All the search
procedure turns out to be wasted here. It is like a dead end.
25. What do you mean by local maxima with respect to search technique?
Local maximum is the peak that is higher than each of its neighbour states, but
lowers than the global maximum i.e. a local maximum is a tiny hill on the surface
whose peak is not as high as the main peak (which is a optimal solution). Hill
climbing fails to find optimum solution when it encounters local maxima. Any small
move, from here also makes things worse (temporarily). At local maxima all the
search procedure turns out to be wasted here. It is like a dead end.
www.Notesfree.in
www.Notesfree.in
Ridge and plateau in hill climbing can be avoided using methods like
backtracking, making big jumps. Backtracking and making big jumps help to avoid
plateau, whereas, application of multiple rules helps to avoid the problem of ridges.
n
It finds solution slow as It finds a solution more quickly.
Performance compared to an informed
e.i
search.
Completion It is always complete. It may or may not be complete.
Cost Factor Cost is high. Cost is low.
It consumes moderate time It consumes less time because of
Time
because of slow searching. quick searching.
Direction
e
No suggestion is given
regarding the solution in it.
There is a direction given about
the solution.
sfr
It is lengthier while It is less lengthy while
Implementation
implemented. implemented.
It is comparatively less It is more efficient as efficiency
efficient as incurred cost is takes into account cost and
Efficiency more and the speed of finding performance. The incurred cost
ote
Example Example
a) Breadth first search a) Best first search
b) Uniform cost search b) Greedy search
Examples of
c) Depth first Search c) A* search
Algorithms
w
www.Notesfree.in
www.Notesfree.in
In a game of chance, we can add extra level of chance nodes in game search tree.
These nodes have successors which are the outcomes of random element. The
minimax algorithm uses probability P attached with chance node di based on this
value. Successor function S(N,di) give moves from position N for outcome di
Part B
1. Enumerate Classical “Water jug Problem”. Describe the state space for this
problem and also give the solution.
2. How to define a problem as state space search? Discuss it with the help of an
n
example
3. Solve the given problem. Describe the operators involved in it.
e.i
Consider a Water Jug Problem : You are given two jugs, a 4-gallon one and
a 3-gallon one. Neither has any measuring markers on it. There is a pump
that can be used to fill the jugs with water. How can you get exactly 2 gallons
of water into the 4-gallon jug ? Explicit Assumptions: A jug can be filled
e
from the pump, water can be poured out of a jug onto the ground, water can
be poured from one jug to another and that there are no other measuring
devices available.
sfr
4. Define the following problems. What types of control strategy is used in the
following problem.
i.The Tower of Hanoi
ii.Crypto-arithmetic
ote
10. Explain the nature of heuristics with example. What is the effect of heuristics
accuracy?
11. Explain the various types of hill climbing search techniques.
ww
12. Discuss about constraint satisfaction problem with a algorithm for solving a
crypt arithmetic Problem.
13. Solve the following Crypt arithmetic problem using constraints satisfaction
search procedure.
CROSS
+ROADS
------------
DANGER
----------------
www.Notesfree.in
www.Notesfree.in
14. Explain alpha-beta pruning algorithm and the Minmax game playing
algorithm with example?
15. Solve the given problem. Describe the operators involved in it.
Consider a water jug problem: You are given two jugs, a 4-gallon one and a
3-gallon one. Neither have any measuring Markers on it. There is a pump
that can be used to fill the jug with water. How can you get exactly 2 gallons
of water into the 4 gallon jug? Explicit Assumptions: A jug can be filled from
the pump, water can be poured out of a jug onto the ground, water can be
poured from one jug to another and that there are no other measuring
n
devices available.
e.i
Acting under uncertainty – Bayesian inference – naïve bayes models. Probabilistic
reasoning – Bayesian networks – exact inference in BN – approximate inference in BN –
causal networks.
e
Part A
1. Why does uncertainty arise?
Agents almost never have access to the whole truth about their environment.
sfr
Uncertainty arises because of both laziness and ignorance. It is inescapable in
complex, nondeterministic, or partially observable environments
Agents cannot find a categorical answer.
Uncertainty can also arise because of incompleteness, incorrectness in agents
understanding of properties of environment.
ote
Probability provides the way of summarizing the uncertainty that comes from
our laziness and ignorance. Probability statements do not have quite the same kind
of semantics known as evidences.
Utility theory says that every state has a degree of usefulness, or utility to in
agent, and that the agent will prefer states with higher utility.
`The fundamental idea of decision theory is that an agent is rational if and only if
it chooses the action that yields the highest expected utility, averaged over all the
possible outcomes of the action. This is called the principle of maximum expected
utility (MEU).
www.Notesfree.in
www.Notesfree.in
n
P(A|B). Bayes' theorem allows updating the probability prediction of an event by
observing new information of the real world.
e.i
Example: If cancer corresponds to one's age then by using Bayes' theorem,
we can determine the probability of cancer more accurately with the help of age.
P(A/B)=[P(A)*P(B/A)]/P(B)
e
9. Given that P(A)=0.3,P(A|B)=0.4 and P(B)=0.5, Compute P(B|A).
sfr
0.4 = (0.3*P(B/A))/0.5
ote
P(B/A) = 0.66
also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a
probability distribution, and also use probability theory for prediction and
anomaly detection.
A Bayesian network is a directed graph in which each node is annotated with
w
Part B
www.Notesfree.in
www.Notesfree.in
n
8. How to get the approximate inference from Bayesian network.
9. Construct a Bayesian Network and define the necessary CPTs for the given
e.i
scenario. We have a bag of three biased coins a,b and c with probabilities of
coming up heads of 20%, 60% and 80% respectively. One coin is drawn
randomly from the bag (with equal likelihood of drawing each of the three
coins) and then the coin is flipped three times to generate the outcomes X1,
X2 and X3.
e
a. Draw a Bayesian network corresponding to this setup and define the
relevant CPTs.
sfr
b. Calculate which coin is most likely to have been drawn if the flips come
up HHT
10. Consider the following set of propositions
Patient has spots
ote
www.Notesfree.in
www.Notesfree.in
PART - A
n
programming in order to automatically learn and improve with experience. For
example: Robots are programed so that they can perform the task based on data
e.i
they gather from sensors. It automatically learns programs from data.
e
programmed. While, data mining can be defined as the process in which the
unstructured data tries to extract knowledge or unknown interesting patterns.
sfr
3. What is ‘Overfitting’ in Machine learning?
In machine learning, when a statistical model describes random error or noise
instead of underlying relationship ‘overfitting’ occurs. When a model is
excessively complex, overfitting is normally observed, because of having too
ote
many parameters with respect to the number of training data types. The model
exhibits poor performance which has been overfit.
database and you are forced to come with a model based on that. In such situation,
you can use a technique known as cross validation. In this method the dataset
splits into two section, testing and training datasets, the testing dataset will only
test the model while, in training dataset, the datapoints will come up with the
ww
www.Notesfree.in
www.Notesfree.in
Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement Learning
Transduction
Learning to Learn
n
8. What are the three stages to build the hypotheses or model in machine
learning?
e.i
Model building
Model testing
Applying the model
e
9. What is the standard approach to supervised learning?
The standard approach to supervised learning is to split the set of example into
the training set and the test.
sfr
10. What is ‘Training set’ and ‘Test set’?
In various areas of information science like machine learning, a set of data is
used to discover the potentially predictive relationship known as ‘Training Set’.
ote
Training set is an examples given to the learner, while Test set is used to test the
accuracy of the hypotheses generated by the learner, and it is the set of example
held back from the learner. Training set are distinct from Test set.
11. What is the difference between artificial learning and machine learning?
Designing and developing algorithms according to the behaviours based on
.N
13. What is the main key difference between supervised and unsupervised
machine learning?
supervised learning Unsupervised learning
The supervised learning technique needs Unsupervised learning does not
labelled data to train the model. For need any labelled dataset. This is
example, to solve a classification problem the main key difference between
(a supervised learning task), you need to supervised learning and
have label data to train the model and to unsupervised learning.
classify the data into your labelled groups.
www.Notesfree.in
www.Notesfree.in
n
demerit of the linear model is overfitting. Similarly, underfitting is also a
significant disadvantage of the linear model.
e.i
16. What is the difference between classification and regression?
Classification is used to produce discrete results; classification is used to
classify data into some specific categories. For example, classifying emails into
spam and non-spam categories. Whereas, we use regression analysis when we are
e
dealing with continuous data, for example predicting stock prices at a certain point
in time.
sfr
17. What is the difference between stochastic gradient descent (SGD) and
gradient descent (GD)?
Both algorithms are methods for finding a set of parameters that minimize a
loss function by evaluating parameters against data and then making adjustments.
ote
In standard gradient descent, you'll evaluate all training samples for each set of
parameters. This is akin to taking big, slow steps toward the solution. In stochastic
gradient descent, you'll evaluate only 1 training sample for the set of parameters
before updating them. This is akin to taking small, quick steps toward the solution.
Least squares problems fall into two categories: linear or ordinary least
squares and nonlinear least squares, depending on whether or not the residuals are
linear in all unknowns. The linear least-squares problem occurs in statistical
regression analysis; it has a closed-form solution.
w
19. What is the difference between least squares regression and multiple
regression?
The goal of multiple linear regression is to model the linear relationship
ww
www.Notesfree.in
www.Notesfree.in
n
coefficients.
e.i
23. What are the advantages of Bayesian Regression?
Extremely efficient when the dataset is tiny.
Particularly well-suited for online learning as opposed to batch learning,
when we know the complete dataset before we begin training the model.
This is so that Bayesian Regression can be used without having to save
data.
e
The Bayesian technique has been successfully applied and is quite strong
sfr
mathematically. Therefore, using this requires no additional prior
knowledge of the dataset.
Naive Bayes
K-Nearest Neighbors
Decision Tree
Support Vector Machines
w
intrinsically two-class. For multiclass problem you will need to reduce it into
multiple binary classification problems. Random Forest works well with a mixture
of numerical and categorical features.
www.Notesfree.in
www.Notesfree.in
n
Discriminative models are a class of supervised machine learning models
which make predictions by estimating conditional probability P(y|x). In order to
e.i
use a generative model, more unknowns should be solved: one has to estimate
probability of each class and probability of observation given class.
e
problems. A type of discriminative modelling, support vector machine (SVM)
creates a decision boundary to segregate n-dimensional space into classes. The
sfr
best decision boundary is called a hyperplane created by choosing the extreme
points called the support vectors.
results without even using hyper-parameter tuning. Because of its simplicity and
diversity, it is one of the most used algorithms for both classification and
regression tasks.
34. What is Decision Tree Classification?
w
www.Notesfree.in
www.Notesfree.in
36. Do you think 50 small decision trees are better than a large one? Why?
Yes. Because a random forest is an ensemble method that takes many weak
decision trees to make a strong learner. Random forests are more accurate, more
robust, and less prone to overfitting.
37. You’ve built a random forest model with 10000 trees. You got delighted after
getting training error as 0.00. But, the validation error is 34.23. What is going
on? Haven’t you trained your model perfectly?
n
The model has overfitted. Training error 0.00 means the classifier has
mimicked the training data patterns to an extent, that they are not available in the
e.i
unseen data. Hence, when this classifier was run on an unseen sample, it couldn’t
find those patterns and returned predictions with higher error. In a random forest,
it happens when we use a larger number of trees than necessary. Hence, to avoid
this situation, we should tune the number of trees using cross-validation.
e
38. When would you use random forests vs SVM and why?
There are a couple of reasons why a random forest is a better choice of the
sfr
model than asupport vector machine:
● Random forests allow you to determine the feature importance. SVM’s can’t
do this.
● Random forests are much quicker and simpler to build than an SVM.
● For multi-class classification problems, SVMs require a one-vs-rest method,
ote
Part – B
1. Assume a disease so rare that it is seen in only one person out of every
million. Assume also that we have a test that is effective in that if a person has
the disease, there is a 99 percent chance that the test result will be positive;
.N
however, the test is not perfect, and there is a one in a thousand chance that
the test result will be positive on a healthy person. Assume that a new patient
arrives and the test result is positive. What is the probability that the patient
has the disease?
w
explanation with a diagram. Explain the use of all the terms and constants
that you introduce and comment on the range of values that they can take.
6. Explain the following
a) Linear regression
b) Logistic Regression
www.Notesfree.in
www.Notesfree.in
PART - A
1. What is bagging and boosting in ensemble learning?
Bagging is a way to decrease the variance in the prediction by generating additional
data for training from dataset using combinations with repetitions to produce multi-sets of the
original data. Boosting is an iterative technique which adjusts the weight of an observation
n
based on the last classification.
e.i
2. What is stacking in ensemble learning?
Stacking is one of the most popular ensemble machine learning techniques used to
predict multiple nodes to build a new model and improve model performance. Stacking
enables us to train multiple models to solve similar problems, and based on their combined
output, it builds a new model with improved performance.
e
3. Which are the three types of ensemble learning?
sfr
The three main classes of ensemble learning methods are bagging, stacking, and
boosting, and it is important to both have a detailed understanding of each method and to
consider them on your predictive modeling project.
There are two main reasons to use an ensemble over a single model, and they are
related; they are: Performance: An ensemble can make better predictions and achieve better
performance than any single contributing model. Robustness: An ensemble reduces the
spread or dispersion of the predictions and model performance.
A voting classifier is a machine learning estimator that trains various base models or
estimators and predicts on the basis of aggregating the findings of each base estimator. The
aggregating criteria can be combined decision of voting for each estimator output
w
8. What are Gaussian mixture models How is expectation maximization used in it?
Expectation maximization provides an iterative solution to maximum
likelihood estimation with latent variables. Gaussian mixture models are an approach
www.Notesfree.in
www.Notesfree.in
to density estimation where the parameters of the distributions are fit using the
expectation-maximization algorithm.
9. What is k-means unsupervised learning?
K-Means clustering is an unsupervised learning algorithm. There is no labeled
data for this clustering, unlike in supervised learning. K-Means performs the division
of objects into clusters that share similarities and are dissimilar to the objects
belonging to another cluster. The term 'K' is a number.
n
algorithm. K in K-Means refers to the number of clusters, whereas K in KNN is the
number of nearest neighbors (based on the chosen distance metric).
e.i
11. What is expectation maximization algorithm used for?
` The EM algorithm is used to find (local) maximum likelihood parameters of a
statistical model in cases where the equations cannot be solved directly. Typically
these models involve latent variables in addition to unknown parameters and known
data observations.
e
sfr
12. What is the advantage of Gaussian process?
Gaussian processes are a powerful algorithm for both regression and
classification. Their greatest practical advantage is that they can give a reliable
estimate of their own uncertainty.
ote
Part – B
1. Explain briefly about unsupervised learning structure?
2. Explain various learning techniques involved in unsupervised learning?
3. What is Gaussian process? And explain in detail of Gaussian parameter
estimates with suitable examples.
4. Explain the concepts of clustering approaches. How it differ from classification.
5. List the applications of clustering and identify advantages and disadvantages of
clustering algorithm.
6. Explain about EM algorithm.
www.Notesfree.in
www.Notesfree.in
n
e.i
2. Which activation function is used in multilayer perceptron?
Image result for Perceptron - Multilayer perceptron, activation functions
The Sigmoid Activation Function: Activation in Multilayer Perceptron Neural Networks.
e
In MLP and CNN neural network models, ReLU is the default activation function for
hidden layers. In RNN neural network models, we use the sigmoid or tanh function for
sfr
hidden layers. The tanh function has better performance. Only the identity activation function
is considered linear.
including classification. The activation function is the source of the MLP power. Careful
selection of the activation function has a huge impact on the network performance.
ReLU
Leaky ReLU
Parameterised ReLU
Exponential Linear Unit
www.Notesfree.in
www.Notesfree.in
n
10. What do you mean by activation function?
e.i
An activation function is a function used in artificial neural networks which outputs a
small value for small inputs, and a larger value if its inputs exceed a threshold. If the inputs
are large enough, the activation function "fires", otherwise it does nothing.
e
Perceptron networks have several limitations. First, the output values of a perceptron can
take on only one of two values (0 or 1) because of the hard-limit transfer function. Second,
sfr
perceptrons can only classify linearly separable sets of vectors.
single neuron.
and actual outputs. It's an inexact but powerful technique. Stochastic gradient descent is
widely used in machine learning applications.
Gradient Descent is the most common optimization algorithm and the foundation of how
we train an ML model. But it can be really slow for large datasets. That's why we use a
variant of this algorithm known as Stochastic Gradient Descent to make our model learn a lot
faster.
16. What is stochastic gradient descent and why is it used in the training of neural
networks?
Stochastic Gradient Descent is an optimization algorithm that can be used to train neural
network models. The Stochastic Gradient Descent algorithm requires gradients to be
calculated for each variable in the model so that new values for the variables can be
calculated.
www.Notesfree.in
www.Notesfree.in
17. What are the three main types gradient descent algorithm?
There are three types of gradient descent learning algorithms: batch gradient descent,
stochastic gradient descent and mini-batch gradient descent.
19. How do you solve the vanishing gradient problem within a deep neural network?
n
The vanishing gradient problem is caused by the derivative of the activation function used
to create the neural network. The simplest solution to the problem is to replace the activation
e.i
function of the network. Instead of sigmoid, use an activation function such as ReLU
e
network. This means that a node with this problem will forever output an activation value of
0.0. This is referred to as a “dying ReLU“
sfr
21. Why is ReLU used in deep learning?
The ReLU function is another non-linear activation function that has gained popularity in
the deep learning domain. ReLU stands for Rectified Linear Unit. The main advantage of
using the ReLU function over other activation functions is that it does not activate all the
ote
Part – B
1. Draw the architecture of a single layer perceptron (SLP) and explain its
w
www.Notesfree.in