212 Final-Solution
212 Final-Solution
Name:
ID:
Section:
(a) 5 sets are used for training and 5 sets are used for testing.
(b) 9 sets are used for training and 1 set is used for testing.
(c) 1 random set is used for training and a different 1 random set is used for testing.
(d) 1 set is used for training and 9 sets are used for testing.
(e) 1 random set is used for training and the same random set is used for testing.
2. In uniform cost search enhanced with the extended list rule, which of the following is
correct:
(a) If you reach a node N that was extended before with lower cost, stop and do not
extend it.
(b) If you reach a node N that was previously extended, then it will always be extended.
(c) If you reach a node N that was not previously extended, stop and do not extend it.
(d) If you reach a node N that was extended before with higher cost, stop and do not
extend it.
(e) None of the above.
(a) k=10
(b) k=29
(c) k=9
(d) k=22
(e) k=20
1. {S}
cost: 10
2. {SB, SA}
cost: 6 11
3. {SA, SBD, SBA}
cost: 11 14 17
4. {SAB, SAD, SBD, SBA}
cost: 8 11 14 17
5. {SAD, SBD, SBA}
cost: 11 14 17
6. {SADB, SADE, SADG, SBD, SBA}
cost: 11 11 11 14 17
7. {SADE, SADG, SBD, SBA}
cost: 11 11 14 17
8. {SADEG, SADG, SBD, SBA}
cost: 9 11 14 17
5. To detect an object in an image of size 4x4, a designer selects a filter of size 2x2 with a
bias of zero as shown below:
1 1 0 0
0 1 1 0
0 0 1 1 1 0
1 0 0 1 0 1
Image Filter
2 2 0 2 3 1
0 2 2 2 0 3 1 0 2 0 3 2
0 0 2 0 2 1 3 2 0 0 0 2
2 2 7 3
9 4 6 1
8 5 2 4
3 1 2 6
9 9 7 7
9 9 7 7 9 7 7
9 7 8 8 6 6 5 2 9 6 6
8 6 8 8 6 6 9 1 2 8 5 6
8. Suppose we have two machine learning models, named Model1 and Model2 respectively.
These models are trained and tested using some data. The error rate of Model1 on
training set is low and that on testing set is high. The error rate of Model 2 is high on
both the training set and testing set. Which of the following statements is most likely to
be TRUE:
10. What Boolean function is implemented by the following perceptron with the given weights
(w1 = w2 = 0.6) and threshold (1)?
(a) OR gate
(b) XOR gate
(c) NOT gate
(d) AND gate
(e) None of the above
12. Which of the following statement(s) are true about SVM kernels?
X Y Z
0 0 1
0 1 0
1 0 0
1 1 1
Indicate the correct thresholds, T1, T2 and T3 to implement the function using
the shown MLP. Note that T1 and T2 perceptrons are Universal OR gates, while T3
preceptron is a Universal AND gate.
(a) T1 = 1, T2 = -1, T3 = 2
(b) T1 = -1, T2 = 1, T3 = -1
(c) T1 = 0, T2 = 0, T3 = 2
(d) T1 = 1, T2 = -1 , T3 = 1
(e) T1 = 0, T2 = 1, T3 = -1
14. When are you more likely to consider using SVM over other classi-
fiers?
(a) When there is a need to increase data points (more training data).
(b) When there is a need to decrease data points (remove some training data).
(c) When there is a need to calculate more variables that are related to the train data
to have a better classification.
(d) When other classifiers fail due to the lack of hidden structure in the training data.
(e) None of the above.
Suppose we used Backtracking search with filtering (Forward checking) and Minimum-
Remaining-Variable (MRV), Which of the following is the correct sequence of color as-
signment to all nodes starting by the assignment A = C1?
(a) A, B, E, D, F, C, H, G, I
(b) A, B, C, D, E, F, G, H, I
(c) A, B, E, D, F, C, G, H, I
(d) A, D, E, B, F, C, H, G, I
(e) None of the above
(a) The classifier that finds the smallest distance among support vectors of different
classes.
(b) The classifier that finds some arbitrary line that can classify different sets of data.
(c) The classifier that uses a kernel to find a good line that separates different classes.
(d) The classifier that finds the widest distance among support vectors of different
classes.
(e) None of the above.
(a) For step size η > 0.6, the gradient descent can escape the local minimum and
attempt to find the global minimum.
(b) For step size η < 0.3, the gradient descent can guarantee convergence to the global
minimum.
(c) If we start with step size η = 1.0 rather than η = 0.2, gradient descent guarantees
convergence to the global minimum.
(d) For any step size η, the gradient descent cannot find the global minimum once it
finds the local minimum at X5 .
(e) None of the above.
What will be the total number of weights and biases that we will need to train if the
Neural network has a fully connected architecture.
(a) 15
(b) 20
(c) 60
(d) 35
(e) 44
21. Which of the following is NOT true about the backpropagation algorithm in neural
network?
(a) The forward pass involves computing partial derivatives of an error function with
respect to the network parameters.
(b) The algorithm is used for neural networks learning.
(c) The algorithm uses gradient descent to minimize a divergence function.
(d) Adjusting the neural network parameters depends on the gradients computations.
(e) The activation functions used must be differentiable.
What will be the output y, given the inputs X1 = 3, X2 = 2, and X3 = 4 for each of the
following activation functions?
(a) 0 for the Threshold activation function, 1 for ReLU activation function
(b) 11 for the Threshold activation function, 11 for ReLU activation function
(c) 0 for the Threshold activation function, 0 for ReLU activation function
(d) 1 for the Threshold activation function, 11 for ReLU activation function
(e) 1 for the Threshold activation function, 16 for ReLU activation function
What are the initial cluster assignments? (Which sample points are in cluster c1 and
which sample points are in cluster c2?)
24. If the following are the required stages in a CNN: (1) Pooling (2) Flattening (3) Softmax
activation function (4) Convolution. Suppose that we will use each state only once in our
CNN, then in which order these phases are performed?
(a) 4, 1, 3, 2
(b) 3, 1, 4, 2
(c) 4, 1, 2, 3
(d) 2, 3, 4, 1
(e) 3, 4, 2, 1
(a) “O” (if k=1), “O” (if k=3), “O” (if k=7)
(b) “X” (if k=1), “O” (if k=3), “X” (if k=7)
(c) “X” (if k=1), “X” (if k=3), “O” (if k=7)
(d) “X” (if k=1), “X” (if k=3), “X” (if k=7)
(e) “O” (if k=1), “X” (if k=3), “X” (if k=7)
26. For a convex error function (i.e., with a bowl shape), which of the following is true, given
the optimal learning rate?
w1 = 1 b1 = 0
w2 = -1 b2 = 0
w3 = 1 b3 = 0
w4 = 1
Assume( that all neurons use the ReLu activation function which is given by
z, if z ≥ 0.
f (z) =
0, otherwise.
Which of the following are the correct values of the output y when the input x = 5 and
x = -5, respectively?
(a) y = 5, y = 5
(b) y = -10, y = 10
(c) y = 10, y = -10
(d) y = 0, y = 0
(e) y = -5, y = 5
28. Referring to question 27, which of the following functions represent the output y of the
given ANN?
(a) y = -x
(b) y = min (0, x)
(c) y = x
(d) y = |x|
(e) y = max(0, x)
30. Consider the following table representing the heuristics h1(x), h2(x) and h3(x) for esti-
mating the distance from each node x to the goal node G in the graph below. Which of
these heuristics is/are admissible?
S A B C D G
h1(x) 7 6 7 3 1 0
h2(x) 6 6 3 3 1 0
h3(x) 5 5 3 1 1 0
a. Ignoring the point “T” with class labeled as “New”, draw on the graph above the decision
boundaries for 1-Nearest Neighbors (5 points)
b. Identify which class will a k-NN classifier predict if the point labeled “New” is classified using
the value of k that are shown in the table below: (5 points)
10
a. Draw the architecture of the MLP that can classify data within the shapes shown graph above.
Assume that the value of all the biases in the MLP are set to zero. [2.5 Points]
b. On the Drawing show the threshold value of each perceptron by writing it inside the percep-
tron. [2.5 Points]
c. On the Drawing, clearly show the used weights for each connection in your architecture. [10
Points]