0% found this document useful (0 votes)
21 views

212 Final-Solution

Uploaded by

jt89xgmzxd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

212 Final-Solution

Uploaded by

jt89xgmzxd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

King Fahd University of Petroleum and Minerals

College of Computing and Mathematics


Computer Engineering Department

COE 292: Introduction to Artificial Intelligence


Term 212 (Spring 2022)
Final Exam (Answers Code 001)
Monday May 16, 2022

Time: 120 minutes, Total Pages: 23 (including coverpage)

Name:

ID:

Section:

Instructor Section Time Instructor Section Time


Akram Ahmad 01 0800-0850 Ahmad Almulhem 08 1100-1150
AbdulJabbar Siddiqui 02 0800-0850 Ahmad Almulhem 09 1200-1250
Akram Ahmad 03 0900-0950 Abdulaziz Barnawi 10 1200-1250
Ahmad Almulhem 04 0900-0950 AbdulJabbar Siddiqui 14 0900-0950
C Kamal 05 1000-1050 Abdulaziz Barnawi 15 1000-1050
AbdulJabbar Siddiqui 06 1000-1050 C Kamal 16 1100-1150
Akram Ahmad 07 1100-1150 C Kamal 18 1300-1350

Answers Code 001 Page 1 of 23


1. In 10-fold cross validation, the dataset is randomly divided into 10 equal sets,
where

(a) 5 sets are used for training and 5 sets are used for testing.
(b) 9 sets are used for training and 1 set is used for testing.
(c) 1 random set is used for training and a different 1 random set is used for testing.
(d) 1 set is used for training and 9 sets are used for testing.
(e) 1 random set is used for training and the same random set is used for testing.

Correct answers: (b)

2. In uniform cost search enhanced with the extended list rule, which of the following is
correct:

(a) If you reach a node N that was extended before with lower cost, stop and do not
extend it.
(b) If you reach a node N that was previously extended, then it will always be extended.
(c) If you reach a node N that was not previously extended, stop and do not extend it.
(d) If you reach a node N that was extended before with higher cost, stop and do not
extend it.
(e) None of the above.

Correct answers: (a)

Answers Code 001 Page 2 of 23


3. Consider a k-NN classifier. Based on the below graph that shows the error rate vs. value
of k, what value of k at which k-NN performs optimally?

(a) k=10
(b) k=29
(c) k=9
(d) k=22
(e) k=20

Correct answers: (a)

Answers Code 001 Page 3 of 23


4. Given the following steps for A* search, which items were removed due to the extended
list rule?

1. {S}
cost: 10
2. {SB, SA}
cost: 6 11
3. {SA, SBD, SBA}
cost: 11 14 17
4. {SAB, SAD, SBD, SBA}
cost: 8 11 14 17
5. {SAD, SBD, SBA}
cost: 11 14 17
6. {SADB, SADE, SADG, SBD, SBA}
cost: 11 11 11 14 17
7. {SADE, SADG, SBD, SBA}
cost: 11 11 14 17
8. {SADEG, SADG, SBD, SBA}
cost: 9 11 14 17

(a) SBA and SADB


(b) SAB and SADB
(c) SADB and SADE
(d) SAD and SBA
(e) None of the above

Correct answers: (b)

5. To detect an object in an image of size 4x4, a designer selects a filter of size 2x2 with a
bias of zero as shown below:

1 1 0 0
0 1 1 0
0 0 1 1 1 0
1 0 0 1 0 1
Image Filter

What will be the output of the convolution if we use a stride of 2?

2 2 0 2 3 1
0 2 2 2 0 3 1 0 2 0 3 2
0 0 2 0 2 1 3 2 0 0 0 2

(a) (b) (c) (d) (e)

Correct answers: (b)

Answers Code 001 Page 4 of 23


6. An intelligent approach that can be used in solving Constraint Satisfaction Problems
(CSP) is called the approach. In this approach, when assigning a variable, at a
certain step, results in failure of assigning a subsequent variable, the algorithm backtracks
immediately to the variable causing the failure.

(a) flight scheduling


(b) forward checking
(c) conflict set
(d) lexical order
(e) minimum remaining values (MRV)

Correct answers: (c)

7. Consider the the following gray level image.

2 2 7 3
9 4 6 1
8 5 2 4
3 1 2 6

What is the output of applying a 2x2 max-pooling filter with a stride of 2?

9 9 7 7
9 9 7 7 9 7 7
9 7 8 8 6 6 5 2 9 6 6
8 6 8 8 6 6 9 1 2 8 5 6

(a) (b) (c) (d) (e)

Correct answers: (a)

8. Suppose we have two machine learning models, named Model1 and Model2 respectively.
These models are trained and tested using some data. The error rate of Model1 on
training set is low and that on testing set is high. The error rate of Model 2 is high on
both the training set and testing set. Which of the following statements is most likely to
be TRUE:

(a) Model1 is underfitting on training data, Model2 is overfitting on training data.


(b) Model1 is overfitting on testing data, Model2 is overfitting on testing data.
(c) Model1 is overfitting on training data, Model2 is underfitting on training data.
(d) Model1 is underfitting on testing data, Model is underfitting on testing data.
(e) None of the above.

Correct answers: (c)

Answers Code 001 Page 5 of 23


9. Which of the following statement is most accurate when considering k-NN algo-
rithm:

(a) A very large value of k always leads to a more accurate classification.


(b) A very small value of k makes the algorithm highly sensitive to noisy data (e.g.,
outliers).
(c) A very small value of k makes the algorithm less sensitive to noisy data (e.g.,
outliers).
(d) A very large value of k makes the algorithm highly sensitive to noisy data (e.g.,
outliers).
(e) None of the above.

Correct answers: (b)

10. What Boolean function is implemented by the following perceptron with the given weights
(w1 = w2 = 0.6) and threshold (1)?

(a) OR gate
(b) XOR gate
(c) NOT gate
(d) AND gate
(e) None of the above

Correct answers: (d)

Answers Code 001 Page 6 of 23


11. Using k-means algorithm and Euclidean distance for the clustering, some training data
were grouped into three clusters. Suppose the centroids for the three learnt clus-
ters are C1, C2 and C3. Let the current cluster centroids be C1 = (2,10), C2 =
(6,6), C3 = (1.5,3.5). If there are three new test points A1 = (2.5, 10), A2 = (2,4),
A3 = (4,9), which cluster centroid would each of the given test points be assigned
to?

(a) A1 to C2, A2 to C3, A3 to C1


(b) A1 to C3, A2 to C2, A3 to C3
(c) A1 to C2, A2 to C3, A3 to C3
(d) A1 to C1, A2 to C3, A3 to C1
(e) A1 to C1, A2 to C1, A3 to C2

Correct answers: (d)

12. Which of the following statement(s) are true about SVM kernels?

i. Kernel functions map low dimensional data to a higher dimensional space

ii. Kernel functions stretch the space to separate different classes

iii. Kernel functions always cause SVM overfitting

(a) Statement (ii) is correct only


(b) Statements (i) and (ii) are correct
(c) Statements (i), (ii) and (iii) are correct
(d) Statement (i) is correct only
(e) Statements (ii) and (iii) are correct

Correct answers: (b)

Answers Code 001 Page 7 of 23


13. A Boolean function is to be implemented using an MLP with appropriate weights and
thresholds. Below is the truth table of the required Boolean function and the correspond-
ing suggested architecture of the MLP.

X Y Z
0 0 1
0 1 0
1 0 0
1 1 1

Indicate the correct thresholds, T1, T2 and T3 to implement the function using
the shown MLP. Note that T1 and T2 perceptrons are Universal OR gates, while T3
preceptron is a Universal AND gate.

(a) T1 = 1, T2 = -1, T3 = 2
(b) T1 = -1, T2 = 1, T3 = -1
(c) T1 = 0, T2 = 0, T3 = 2
(d) T1 = 1, T2 = -1 , T3 = 1
(e) T1 = 0, T2 = 1, T3 = -1

Correct answers: (c)

14. When are you more likely to consider using SVM over other classi-
fiers?

(a) When there is a need to increase data points (more training data).
(b) When there is a need to decrease data points (remove some training data).
(c) When there is a need to calculate more variables that are related to the train data
to have a better classification.
(d) When other classifiers fail due to the lack of hidden structure in the training data.
(e) None of the above.

Correct answers: (c)

Answers Code 001 Page 8 of 23


15. Consider the figure shown below. Suppose a computer program attempts to color nodes
such that nodes having an edge connecting them must be colored differently. The program
can select colors from the domain C1, C2, C3. Furthermore, the computer program is
written to select nodes in lexical order for coloring.

Suppose we used Backtracking search with filtering (Forward checking) and Minimum-
Remaining-Variable (MRV), Which of the following is the correct sequence of color as-
signment to all nodes starting by the assignment A = C1?

(a) A, B, E, D, F, C, H, G, I
(b) A, B, C, D, E, F, G, H, I
(c) A, B, E, D, F, C, G, H, I
(d) A, D, E, B, F, C, H, G, I
(e) None of the above

Correct answers: (c)

Answers Code 001 Page 9 of 23


16. The Maximum Margin Classifier is described as:

(a) The classifier that finds the smallest distance among support vectors of different
classes.
(b) The classifier that finds some arbitrary line that can classify different sets of data.
(c) The classifier that uses a kernel to find a good line that separates different classes.
(d) The classifier that finds the widest distance among support vectors of different
classes.
(e) None of the above.

Correct answers: (d)

17. Identify the type of learning in which labeled training data is


used:

(a) Supervised Learning


(b) Unsupervised Learning
(c) Discovery based learning
(d) Learning by induction
(e) None of the above

Correct answers: (a)

Answers Code 001 Page 10 of 23


18. Suppose you run gradient descent on the function f (x) for few iterations with a step size
df df
η = 0.2 and dx is computed after each iteration. You find that the value of dx decreases
until it reaches a local minimum at X5 .Furthermore, assume that ηopt = 0.3. Based on
this, which of the following statements is TRUE:

(a) For step size η > 0.6, the gradient descent can escape the local minimum and
attempt to find the global minimum.
(b) For step size η < 0.3, the gradient descent can guarantee convergence to the global
minimum.
(c) If we start with step size η = 1.0 rather than η = 0.2, gradient descent guarantees
convergence to the global minimum.
(d) For any step size η, the gradient descent cannot find the global minimum once it
finds the local minimum at X5 .
(e) None of the above.

Correct answers: (a)

Answers Code 001 Page 11 of 23


19. Given the MLP architecture below where X,Y,Z and W are the inputs. The output F
represents a Boolean function F(X,Y,Z,W). Each perceptron uses a Threshold activation
function with a threshold value indicated within each perceptron. Which of the following
Boolean functions, F, is represented by the given MLP architecture?

(a) F = ((X ∨ Y ) ∧ ((X ∨ ¬Z))) ∨ ((Y ∨ W ) ∨ (X ∧ Z ∧ W ))


(b) F = ((X ∨ Y ) ∧ ((X ∨ ¬Z))) ∨ ((Y ∨ W ) ∨ ¬(X ∧ Z ∧ W ))
(c) F = (¬(X ∨ Y ) ∧ (¬(X ∨ ¬Z))) ∨ (¬(Y ∨ W ) ∨ (X ∧ Z ∧ W ))
(d) F = ((¬X ∨ ¬Y ) ∧ ((¬X ∨ Z))) ∨ ((¬Y ∨ ¬W ) ∨ (X ∧ Z ∧ W ))
(e) F = ((X ∧ Y ) ∨ ((X ∧ ¬Z))) ∧ ((Y ∧ W ) ∨ ¬(X ∨ Z ∨ W ))

Correct answers: (b)

Answers Code 001 Page 12 of 23


20. Suppose we have a Neural Network with the following given architecture:

1. Number of perceptrons in the input layer is 3

2. Number of hidden layers is 1

3. Number of perceptrons in the hidden layer is 5

4. Number of perceptrons in the output layer is 4

What will be the total number of weights and biases that we will need to train if the
Neural network has a fully connected architecture.

(a) 15
(b) 20
(c) 60
(d) 35
(e) 44

Correct answers: (e)

21. Which of the following is NOT true about the backpropagation algorithm in neural
network?

(a) The forward pass involves computing partial derivatives of an error function with
respect to the network parameters.
(b) The algorithm is used for neural networks learning.
(c) The algorithm uses gradient descent to minimize a divergence function.
(d) Adjusting the neural network parameters depends on the gradients computations.
(e) The activation functions used must be differentiable.

Correct answers: (a)

Answers Code 001 Page 13 of 23


22. Consider the below single perceptron with weights W0 = −5, W1 = 2, W2 = −1, and
W3 = 3.

What will be the output y, given the inputs X1 = 3, X2 = 2, and X3 = 4 for each of the
following activation functions?

• Threshold activation function with T=5,


(
z, if z ≥ 0.
• ReLu activation function:f (z) =
0, otherwise.

(a) 0 for the Threshold activation function, 1 for ReLU activation function
(b) 11 for the Threshold activation function, 11 for ReLU activation function
(c) 0 for the Threshold activation function, 0 for ReLU activation function
(d) 1 for the Threshold activation function, 11 for ReLU activation function
(e) 1 for the Threshold activation function, 16 for ReLU activation function

Correct answers: (e)

Answers Code 001 Page 14 of 23


23. Consider performing K-Means Clustering on a one-dimensional dataset containing five
sample points: p1 = 5, p2 = 7, p3 = 10, p4 = 12 and p5 = 13. Using k = 2 and the
initial centroids are c1 = 3.0 and c2 = 15.0.

What are the initial cluster assignments? (Which sample points are in cluster c1 and
which sample points are in cluster c2?)

(a) C1 = {P1}, C2 = {P2, P3, P4, P5}


(b) C1 = {P1, P2, P3}, C2 = {P4, P5}
(c) C1 = {P1, P2}, C2 = {P3, P4, P5}
(d) C1 = {}, C2 = {P1, P2, P3, P4, P5}
(e) None of the above

Correct answers: (c)

24. If the following are the required stages in a CNN: (1) Pooling (2) Flattening (3) Softmax
activation function (4) Convolution. Suppose that we will use each state only once in our
CNN, then in which order these phases are performed?

(a) 4, 1, 3, 2
(b) 3, 1, 4, 2
(c) 4, 1, 2, 3
(d) 2, 3, 4, 1
(e) 3, 4, 2, 1

Correct answers: (c)

Answers Code 001 Page 15 of 23


25. In the figure below there are 12 samples from a 1-dimensional dataset (i.e., 1 feature only,
labeled p). Assume that all the samples belong to two classes only, namely “X” and “O”.
In the figure, if more than one samples in the training data have the same feature value,
they are drawn vertically stacked as shown in the figure. For example, two samples in
the training data have a value equal to 4, 6 and 10 as shown in the figure. Using the
given figure as the training data, indicate the correct predictions resulting from the k-NN
classification of a new test point at p=7 (Not shown in the figure) when we use k = 1, k
= 3 and k =7.

(a) “O” (if k=1), “O” (if k=3), “O” (if k=7)
(b) “X” (if k=1), “O” (if k=3), “X” (if k=7)
(c) “X” (if k=1), “X” (if k=3), “O” (if k=7)
(d) “X” (if k=1), “X” (if k=3), “X” (if k=7)
(e) “O” (if k=1), “X” (if k=3), “X” (if k=7)

Correct answers: (e)

26. For a convex error function (i.e., with a bowl shape), which of the following is true, given
the optimal learning rate?

(a) Gradient descent is guaranteed to converge to the global minimum.


(b) Gradient descent is always divergent.
(c) Gradient descent may converge or diverge if step size is less than the optimal step
size.
(d) Gradient descent is not guaranteed to converge to the global minimum.
(e) None of the above.

Correct answers: (a)

Answers Code 001 Page 16 of 23


27. An Artificial Neural Network (ANN) has the shown architecture and the given weights
and biases:

w1 = 1 b1 = 0
w2 = -1 b2 = 0
w3 = 1 b3 = 0
w4 = 1

Assume( that all neurons use the ReLu activation function which is given by
z, if z ≥ 0.
f (z) =
0, otherwise.
Which of the following are the correct values of the output y when the input x = 5 and
x = -5, respectively?

(a) y = 5, y = 5
(b) y = -10, y = 10
(c) y = 10, y = -10
(d) y = 0, y = 0
(e) y = -5, y = 5

Correct answers: (a)

28. Referring to question 27, which of the following functions represent the output y of the
given ANN?

(a) y = -x
(b) y = min (0, x)
(c) y = x
(d) y = |x|
(e) y = max(0, x)

Correct answers: (d)

Answers Code 001 Page 17 of 23


29. Which of the following statements is true about k-Nearest Neighbors classi-
fier:

(a) k-NN requires similar computation time in testing as in training.


(b) k-NN requires higher computation time in testing than in training.
(c) k-NN requires higher computation time in training than in testing.
(d) k-NN is not a classifier but rather an efficient clustering technique.
(e) None of the above.

Correct answers: (b)

30. Consider the following table representing the heuristics h1(x), h2(x) and h3(x) for esti-
mating the distance from each node x to the goal node G in the graph below. Which of
these heuristics is/are admissible?

S A B C D G
h1(x) 7 6 7 3 1 0
h2(x) 6 6 3 3 1 0
h3(x) 5 5 3 1 1 0

(a) h3(x) only


(b) h1(x) and h2(x)
(c) h2(x) and h3(x)
(d) h1(x) only
(e) h2(x) only

Correct answers: (c)

Answers Code 001 Page 18 of 23


Written Part

Write your name and ID below:

Write your name clearly inside this box

Write your ID inside this box

Answers Code 001 Page 19 of 23


Q1 (15 points)
Given the following data points and labels shown below:

Point Coor. Class


A (2, -2) White
B (2, -1) Black
C (0, -1) Black
D (1, 1) Black
E (2.5, 2) Black
F (-1, 3) White
G (-3, 2) White
H (-2, -2) White
I (-2, -1) White
J (-3, 1) White
T (0,1) New

a. Ignoring the point “T” with class labeled as “New”, draw on the graph above the decision
boundaries for 1-Nearest Neighbors (5 points)

b. Identify which class will a k-NN classifier predict if the point labeled “New” is classified using
the value of k that are shown in the table below: (5 points)

k Predicted Class List of k nearest points

10

Answers Code 001 Page 20 of 23


c. What will be the effect of removing the point in each row of the table below on the boundary
of 1-NN, i.e., indicate will removing it affect (Yes) or does not affect (No)? (5 points: 1 point
each)

Point Will removing it change the decision boundary? (Yes/No)

Answers Code 001 Page 21 of 23


Q2 (10 points)
(
z, if z ≥ 0.
Given the shown single perceptron with a ReLu activation function defined as f (z) = ,
0, otherwise.
(
1, if z ≥ 0.
f ′ (z) =
0, otherwise.

a. Given W 1 = 2, X1 = 4, W 2 = −1, X2 = 3, and W 0 = −1. Use backpropagation algorithm


to fill the following table:

Forward Pass Backward Pass


∂Y ∂Y ∂Y ∂Y ∂Y ∂Y
Z Y ∂Z ∂X1 ∂W 1 ∂X2 ∂W 2 ∂W 0
1

b. (2 points) Fill in the blank based on the results in part 1:

(a) In order to decrease the output (Y), w1 should be (increased/decreased).


(b) In order to decrease the output (Y), x2 should be (increased/decreased).

Answers Code 001 Page 22 of 23


Q3 (15 points)
Design an MLP neural network toPclassify the given decision boundary shown below, assuming
simple perceptron: output = 1 if i wi xi ≥ T . The MLP must classify points inside the given
shape verses points that are outside. Note that the coordinates of the points on the given shape
are: P1=(-1,2), P2=(-1,-2), P3=(0,0),P4=(1,1),P5=(2,0), P6=(1,-1).

a. Draw the architecture of the MLP that can classify data within the shapes shown graph above.
Assume that the value of all the biases in the MLP are set to zero. [2.5 Points]

b. On the Drawing show the threshold value of each perceptron by writing it inside the percep-
tron. [2.5 Points]

c. On the Drawing, clearly show the used weights for each connection in your architecture. [10
Points]

Answers Code 001 Page 23 of 23

You might also like