0% found this document useful (0 votes)

222 views14 pages

EE2211_Past_Paper

EE2211 Introduction to machine learning Past year paper

Uploaded by

Richardlim110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

222 views14 pages

EE2211_Past_Paper

EE2211 Introduction to machine learning Past year paper

Uploaded by

Richardlim110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

NATIONAL UNIVERSITY OF SINGAPORE

EE2211 - Introduction to Machine Learning

(Semester 2 : AY2021/2022)

Time Allowed : 2 Hours

INSTRUCTIONS TO STUDENTS

1. This assessment paper contains Three Types of questions: True/False Questions,

Multiple Option Questions, and Fill in Blanks.
2. There are 16 True/False Questions, each with 1.5 marks; 17 Multiple Option
Questions, each with 2 marks; 9 Fill in Blanks with in total 26 blanks, each blank
with 2.5 marks. In total there are 123 marks.
3. Students are required to answer ALL questions.
4. This is an open-book assessment.
EE2211 - Introduction to Machine Learning/ Page 2

True/False Questions (1.5 marks * 16 Questions)

1. If we would like to use imputation for data cleaning, we may consider replacing the
missing value using the average from other samples.
a) True
b) False

2. Data integrity concerns the maintenance and the assurance of data accuracy and
consistency.
a) True
b) False

3. Assume we have two urns. The first contains 6 white and 5 black balls, and the
second contains 4 white and 5 black balls. We flip a fair coin, and pick the first urn
if the coin shows a head while we pick the second urn if the coin shows a tail. We
then draw a ball from the chosen urn. The probability of eventually getting a white
ball is greater than a black ball.
a) True
b) False

4. The One Hot Encoding (OHE) can be applied to a binary target.

a) True
b) False

5. The polynomial model can approximate any continuous real-valued function on a

closed and bounded interval to any degree of accuracy.
a) True
b) False

1 3
6. There are three samples of two-dimensional data points 𝐗 = #4 − 1) with the
3 3
1
corresponding single output target vector 𝐲 = #2). Suppose you want to use a full
1
third-order polynomial model to fit these data, the polynomials model has 9
parameters to learn.
a) True
b) False

7. Suppose the model parameters are given by 𝒘 = [𝑤! , … , 𝑤" ]# . Then an inclusion of
"
the L2-regularization during training reduces the value of 2$&! 𝑤$% .
a) True
EE2211 - Introduction to Machine Learning/ Page 3

b) False

8. Suppose we are minimizing a cost function C(w) with respect to w using a gradient
descent algorithm. We observe the following: At iteration 1, C(w) is -15. At iteration
2, C(w) becomes 15. At iteration 3, C(w) becomes -7. At iteration 4, C(w) becomes
12. At iteration 5, C(w) becomes -13. To improve our algorithm, we should
consider decreasing the learning rate.
a) True
b) False

9. Compared with a decision tree, a random forest has lower squared bias and the
same variance
a) True
b) False

10. To enhance the model generalization capability, we should wisely select samples
from the test set and utilize them in the training stage, which will potentially
improve the performances.
a) True
b) False

11. A higher training accuracy not necessarily leads to a higher test accuracy;
however, in most cases, a higher training accuracy indeed leads to a higher
validation accuracy.
a) True
b) False

12. Suppose we would like to develop a human face detector, where the positive class
corresponds to face and negative class corresponds to non-face. In this case, false
positive of the positive class denotes the face(s) missed by the detector.
a) True
b) False

13. We have built a binary classifier. During testing, it is possible that the sum of the
false-positive-rate and the false-negative-rate is greater than 100%.
a) True
b) False

14. In the naïve K-means method, the optimal K is learned automatically.

a) True
b) False
EE2211 - Introduction to Machine Learning/ Page 4

15. Confusion matrix is a popular metric for evaluating regression tasks.

a) True
b) False

16. Recent research in neural networks mainly focus on manually setting the
parameters of the networks smartly, so that they can generalize well to unseen
data.
a) True
b) False

Multiple Option Questions (1.5 marks * 16 Questions)

1. Which of the following statement(s) is/are true about machine learning? Note that
there might be more than one true option. If so, you should select all the correct
options in order to get all the marks.
a) Jack went hiking with his friends and collected a bag of colorful stones. He
would like to group these stones by color. As such, Jack is doing a
supervised classification on the stones.
b) Jack went hiking with his friends and took 100 pictures of birds. One of his
friends David, who is a scientist working on bird species, looked at all the
pictures and told Jack there are in total five bird species. He picked one
picture from each species, showed it to Jack, and told Jack the name of the
species. Jack went home and categorized the rest of the pictures based on
Jack’s annotation. As such, Jack is doing a supervised classification on the
bird pictures.
c) Jack went hiking with his friends and collected a bag of colorful stones.
After he went home, he measured the weight, size, and color of each stone,
before grouping them into different categories. The weight, size, and color
measurement step can be thought as the feature extraction step.
d) None of the others.

2. Which of the following statement(s) is/are true about data wrangling? Note that there
might be more than one true option. If so, you should select all the correct options in
order to get all the marks.
a) Data wrangling concerns transforming "raw" data into an appropriate form
for downstream purposes such as analytics.
b) Assume we have three classes, car, truck, bicycle, and we would like to
use one-hot encoding to denote the three classes. One possible
representation is as follows: ‘car’=[0,0,1], ‘truck’=[1,0,0], ‘bicycle’=[1,1,1].
c) There is no need to conduct data wrangling if we know a priori that, there
are no noisy samples in the data records.
d) None of the others.
EE2211 - Introduction to Machine Learning/ Page 5

3. The discrete random variable X has the following probability mass function,

X 1 2 3 4
P[X] 0.1 2k 2k 5k

What is value of k?
a) k=0.1
b) k= 0.2
c) k=0.3
d) k=0.02

4. This question is related to the understanding of linear systems and partial

derivatives. Which of the following statements below is/are correct? There can be
more than one answer.

a) In under-determined linear systems, the number of parameters is greater than

the number of unknown equations.
1 4 𝑤! 1
b) The system # 2 7 ) 4𝑤 5= #−2.5) has no exact solution but an approximated
'
−3 11 4
solution is available using the left inverse.
c) If 𝐟(𝐱) is a vector-valued function of size p x1 and 𝐱 is an m x1 vector, then
differentiation of 𝐟(𝐱) with respect to 𝐱 is an m x p matrix.
d) Consider the linear system 𝐗𝐰 = 𝐲, 𝐗 ∈ 𝑅(×" is the input data matrix, 𝐰 ∈ 𝑅"×!
is the parameter vector, and 𝐲 ∈ 𝑅(×! is the target vector. If 𝑑 > 𝑚, the system
has more equations than parameters.
e) None of the others.

5. This question is related to the understanding of linear systems, ridge regression, and
polynomial regression. Which of the following statements below is/are correct?
There can be more than one answer.

a) A set of linear equations is written as 𝐗𝐰 = 𝐲 where 𝐗 ∊ 𝓡*×+ is the input data

matrix, 𝐰 ∊ 𝓡+×! is the parameter vector, and 𝐲 ∊ 𝓡*×! is the target vector. There
are three simultaneous equations in this set of equations.

b) The ridge regression can be applied to multi-target regression.

EE2211 - Introduction to Machine Learning/ Page 6

c) The solution for a ridge regression problem with feature vectors in X and targets
in y can be written as 𝐰D = (𝐗 𝐓 𝐗 + 𝛌𝐈)-𝟏 𝐗 𝐓 𝒚 for λ>0. As λ increases, I
𝐰𝑻𝐰
D
increases.

d) Given four samples of two-dimensional data points 𝐗 and the corresponding

target output y, the 2nd order polynomial regression system is an under-determined
system.

e) None of the others.

6. You are given a collection of 5 training data points of three features (𝑥! , 𝑥' , 𝑥+ ) and
their corresponding class labels which are stacked as follows:
1 3 −2 𝑐𝑙𝑎𝑠𝑠 1
⎡−4 0 − 1⎤ ⎡𝑐𝑙𝑎𝑠𝑠 1⎤
⎢ ⎥
𝐗 = ⎢⎢ 3 1 8 ⎥⎥ , 𝐘 𝐜𝐥𝐚𝐬𝐬 = ⎢𝑐𝑙𝑎𝑠𝑠 2⎥.
⎢2 1 6⎥ ⎢𝑐𝑙𝑎𝑠𝑠 3⎥
⎣8 4 6⎦ ⎣𝑐𝑙𝑎𝑠𝑠 3⎦
We would like to predict the class labels of new test samples. Which of the following
statements below is/are correct? There can be more than one answer.

a) If we would like to perform multi-class classification using the optimal linear regression
model with an offset term, the estimated parameter matrix W is of dimension 3 x 4.

b) Given a new sample (𝑥! , 𝑥' , 𝑥+ ) = (1, −2, 4), the predicted class label using the
optimal linear regression model with an offset term is class 3.

c) If we would like to perform multi-class classification using the optimal second-

order polynomial model, the corresponding polynomial expansion matrix P (with
respect to X) is of dimension 5 x 10. Therefore, it is an under-determined system.

(d) For two new samples 𝐗 012 , after learning and testing using some polynomial
model, the prediction is 𝐘012 = 𝐏012# _ = 4−0.58 0.40 −0.675. Therefore, the class
𝐖
1.25 0.10 −0.20
label prediction for the first sample is class 3 and the second sample is class 1.

e) None of the others.

7. Suppose your machine learning algorithm achieves 95% accuracy on your training
set, but only 10% accuracy on your test set. Which of the following modification
might potentially improve your algorithm's test accuracy? Note that there might be
more than one true option. If so, you should select all the correct options in order to
get all the marks.
(1) Decrease Regularization
(2) Increase Regularization
(3) Decrease Model Complexity
(4) Increase Model Complexity
(5) Decrease Number of Features
(6) Increase Number of Features
EE2211 - Introduction to Machine Learning/ Page 7

8. Suppose we are minimizing a cost function C(w) with respect to w using a gradient
descent algorithm. We observe the following: At iteration 1, C(w) is 11. At iteration
2, C(w) becomes 10.9. At iteration 3, C(w) becomes 10.7. At iteration 4, C(w)
becomes 10.6. At iteration 5, C(w) becomes 10.55. Which of the following is true?
Note that there might be more than one true option. If so, you should select all the
correct options in order to get all the marks.
(1) Decreasing the learning rate will speed up the optimization further
(2) Increasing the learning rate will speed up the optimization further
(3) Adding a regularization to the cost function will speed up the optimization
further
(4) Insufficient information in the question to tell which option is true

9. Consider the cost function C(w) = DataLoss(w) + λRegularization(w). Assume that

the global minimum of C(w) when λ = 2 is 12. Now we change λ to be equal to 20
and again minimize C(w). Assuming we attain the global minimum, the new optimal
cost function value C
(1) will be higher than before
(2) will be lower than before
(3) will stay the same
(4) Insufficient information in the question to tell which option is true.

10. Consider the cost function C(w) = DataLoss(w) + λRegularization(w). When λ = 3,

the minimum of C(w) is achieved at w = w* and C(w*) = 25. Now we change λ to
be equal to 30 and minimize C(w). The minimum now is achieved at w = w**.
Assume that we always achieve the global minimum. Which of the following is
true? Note that there might be more than one true option. If so, you should select
all the correct options.
(1) In general, Regularization(w*) > Regularization (w**)
(2) In general, Regularization (w*) < Regularization (w**)
(3) In general, DataLoss(w*) > DataLoss(w**)
(4) In general, DataLoss(w*) < DataLoss(w**)
(5) Insufficient information in the question to tell which option is true.

11. Suppose in a 2-class classification problem, our decision tree algorithm achieves 52%
accuracy on our training set, and also 52% accuracy on our test set. Which of the
following modification might potentially improve our algorithm's test accuracy? Note
that there might be more than one true option. If so, you should select all the correct
options in order to get all the marks.
(1) Increase maximum depth of tree
EE2211 - Introduction to Machine Learning/ Page 8

(2) Decrease maximum depth of tree

(3) Increase minimum number of samples for splitting a leaf node
(4) Decrease minimum number of samples for splitting a leaf node
(5) Try random forest instead of decision tree

12. Which of the following statement(s) about the training-validation-test is/are true?
Note that there might be more than one true option. If so, you should select all the
correct options in order to get all the marks.

(a) We should always set the number of training samples to be equal to the number
of test samples.
(b) We should always set the number of test samples to be equal to the number of
validation samples.
(c) If no validation data is available, we should test the performances of different
models on the test set, and choose the model that performs the best to be our
model for deployment.
(d) None of the others

13. Which of the following statement(s) is/are true about the confusion matrix in a
classification task? Note that there might be more than one true option. If so, you
should select all the correct options in order to get all the marks.

(a) A reasonable binary classifier should always lead to the following result: True
Negative + True Positive > False Negative + False Positive.
(b) A reasonable binary classifier should always leads to the following result:
False Negative > False Positive
(c) A reasonable binary classifier should always leads to the following result:
True Positive > True Negative
(d) None of the others.

14. Which of the following statement(s) is/are true about ROC curve? Note that there
might be more than one true option. If so, you should select all the correct options in
order to get all the marks.

(a) ROC curve is a widely used evaluation metric.

(b) Area Under the Curve (AUC) is lower-bounded by 0 and upper-bounded by 1.
(c) Given a fixed dataset, a higher Gini coefficient of ROC curve usually indicates
better performance.
(d) None of the others
EE2211 - Introduction to Machine Learning/ Page 9

15. Which of the following statement(s) is/are true about K-means? Note that there might
be more than one true option. If so, you should select all the correct options in order
to get all the marks.

(a) Before convergence, K-means is guaranteed to not increase the objective values.
(b) K-means always produces the same clusters and centroids for different
initializations.
(c) Given a fixed K value, a set of samples, and the objective of K-means, i.e., 𝐽 =
4
∑( '
$&! 23&! 𝑤$3 ‖𝐱 $ − 𝐜3 ‖ , if we traverse all possible values of 𝑤$3 for all 𝑖 and 𝑘,
we will find the global optimal solution, which is guaranteed to be no worse than
the result of K-means.
(d) None of the others

16. Which of the following statement(s) is/are true about unsupervised learning? Note
that there might be more than one true option. If so, you should select all the correct
options in order to get all the marks.

(a) In unsupervised learning, we may use the labels of the validation data for
reference.
(b) K-means always yields better results than fuzzy C-means.
(c) K-means never finds the global optimal solutions.
(d) None of the others.

17. Which of the following statement(s) is/are true about deep neural networks? Note
that there might be more than one true option. If so, you should select all the correct
options in order to get all the marks.

(a) If we build a multilayer perceptron network, we do not need non-linear activation

functions.
(b) Backpropagation involves a backward stage but not a forward stage.
(c) Parameters of the neural networks are chosen manually in a smart way, in order
to produce the outputs as similar as the labels.
(d) None of the others.

Fill in the Blank (In total 9 Questions with 26 blanks, each 2.5 questions, in total
2.5*26 = 65 marks)

1. You are given a collection of 6 training data points of one feature (x) and their class
label (y) as follows:
x ∊{4, 7, 10} are labelled as y = −1, x ∊{2, 3, 9} are labelled as y = +1.
Answer each of the following subquestions regarding the training outcome or prediction.
Note that the training outcome is in terms of the error count, i.e., the total number of data
points being incorrectly classified. [4 Blanks]
EE2211 - Introduction to Machine Learning/ Page 10

(i) What is the best training error count that you can achieve on these data from a
linear classifier, i.e., f(x) = sign( ax + b), on the original input features?

(ii) You need to predict the class label of a new data point given by 𝑥5 = 6
when a linear classifier, i.e., f(x) = sign( ax + b) is used for prediction. What is
the class label (-1 or 1) for 𝑥5 ?

(iii) What is the best training error count that you can achieve from a 4th order
polynomial model, e.g., f(x) = sign( ∑63&7 𝑐3 𝑥 3 )?

(iv) You need to predict the class label of a new data point given by 𝑥5 = 6
when a 4th order polynomial model, e.g., f(x) = sign( ∑63&7 𝑐3 𝑥 3 ) is used for
prediction. What is the class label (-1 or 1) for 𝑥5 ?

2. Suppose our training set comprises 5 data samples shown in the table below.
Each data point consists of 3 features (x1, x2, x3) and a regression target y.

x1 3.3459 1.0893 3.2103 1.744 1.6762

x2 2.7435 2.9113 1.4706 1.2895 2.1366
x3 -1.7253 -0.7804 -0.9944 0.5307 -1.0502
y 2.9972 1.1399 2.228 0.3387 2.5042

On the other hand, our test set comprises 4 data samples shown in the table below.

x1 0.9478 1.4931 2.5809 2.0607

x2 1.1619 1.7705 2.9569 1.1695
x3 0.7779 -0.6149 -1.2087 0.4972
y 0.4168 1.4707 2.1037 0.9177

Our goal is to train a quadratic regression model to predict y. Suppose we utilize

Pearson’s correlation to perform feature selection.

For the purpose of feature selection, the correlation between feature x1 and target y
is equal to BLANK1 (your answer should be up to 3 decimal places)

For the purpose of feature selection, the correlation between feature x2 and target y
is equal to BLANK2 (your answer should be up to 3 decimal places)

For the purpose of feature selection, the correlation between feature x3 and target y
is equal to BLANK3 (your answer should be up to 3 decimal places)

Based on the correlations, the best feature is feature BLANK4 (Your answer
should be an integer 1, 2 or 3 corresponding to feature 1, feature 2 or feature 3).

[4 Blanks]
EE2211 - Introduction to Machine Learning/ Page 11

3. We would like to minimize the cost function 𝐶(𝑤) = sin' (𝑤) using gradient descent.
Suppose we initialize our algorithm with w = 3. The gradient of the cost function at
w = 3 is BLANK1 (your answer should contain up to 3 decimal places). Suppose the
step size is 0.1. After the first round of gradient descent, the updated value of w is
BLANK2 (your answer should be up to 3 decimal places). Note that w is in radians.

[2 Blanks]

4. We would like to minimize the cost function C(x, y) = x2 + xy2 using gradient descent.
Suppose we initialize our algorithm with x = 3, y = 2. Suppose the step size is 0.2.
After the first round of gradient descent, the updated value of x is BLANK1 (your
answer should be up to 2 decimal places) and the updated value of y is BLANK2
(your answer should be up to 2 decimal places).

[2 Blanks]

5. To classify the data points in the plot below with 100% accuracy, a decision tree
needs to have a minimum depth of BLANK (your answer should be a positive
integer)

[1 Blank]
EE2211 - Introduction to Machine Learning/ Page 12

6. Consider the following dataset comprising 10 datapoints: {x, y} = {0.2, 2.1}, {0.7, 1.5},
{1.8, 5.8}, {2.2, 6.1}, {3.7, 9.1}, {4.1, 9.5}, {4.5, 9.8}, {5.1, 12.7}, {6.3, 13.8}, {7.4, 15.9}.
Our goal is to use a regression tree to predict y from x. Suppose at depth 1, we
consider a decision threshold of 3.

The MSE at the root is BLANK1 (your answer should be up to 3 decimal places).

The overall MSE at depth 1 is BLANK2 (your answer should be up to 3 decimal places).

[2 Blanks]

7.
In a binary classification task, we use a dataset that contains in total 1000
samples, where we take out 200 samples as the test set. We then conduct
a standard 4-fold cross validation. In each fold, we set the number of
validation samples to be the same as that of test samples. As such, in each
fold, the binary classifier is trained on _BLANK1_ samples.

Now we have three classifier candidates, where their average training,

validation, and number of parameters are given as follows. We should adopt
classifier #_BLANK2_ for the test set.

Classifiers Average Training Average Number of

Accuracy Validation parameters
Accuracy
Classifier #1 90.5% 92.1% 5000
Classifier #2 91.5% 92.1% 3000
Classifier #3 93.5% 90.5% 3000

[2 Blanks]

8.
Jack developed a classifier to detect spam emails. The classifier conducts
binary classification, and identifies an email to be spam (positive class) or
not spam (negative class). The detected spam emails go into the ‘Spam’
folder and non-spam emails go into the ‘Inbox’ folder.

In an unseen dataset of 100 emails, the classifier achieves the following

performances

Spam (Predicted) Not Spam (Predicted)

EE2211 - Introduction to Machine Learning/ Page 13

Spam (Actual) 40 x
Not Spam (Actual) 15 35

The number x is _BLANK1_.

We can also derive that, among the 100 emails, _BLANK2_ emails have the
actually label of being Spam, and in total _BLANK3_ emails are correctly
classified.

Assume we have the following cost matrix

Spam (Predicted) Not Spam (Predicted)
Spam (Actual) 𝐶8,8 * 40 𝐶8,0 * x
Not Spam (Actual) 𝐶0,8 * 15 𝐶0,0 * 35
We assume that, if an email is correctly classified, we impose no penalty,
i.e., 𝐶𝑠,𝑠 = 𝐶𝑛,𝑛 = 0. On the other hand, we know that customers will be very
unhappy if the spam folder contains non-spam emails. But customers find it
acceptable that some spam emails go into the inbox folder. In this case,
when comparing two classifiers, we should ensure that
𝐶𝑠,𝑛 _BLANK4_𝐶𝑛,𝑠 (choose ‘=’, ‘>’ or ‘<’ and fill in the blank).

With the same dataset, David also developed a classifier that achieves the
following performance
Spam (Predicted) Not Spam (Predicted)
Spam (Actual) 50 y
Not Spam (Actual) 10 25
With the same set of {𝐶𝑛,𝑛 , 𝐶𝑠,𝑠 , 𝐶𝑛,𝑠 , 𝐶𝑠,𝑛 } as above, David’s classifier is
considered to be _BLANK5_ (1:Better or 2:Worse; choose ‘1’ or ‘2’ and fill in the
blank) than the one of Jack’s.

[5 Blanks]

We have a collect of 10 books. We measure their thickness in millimetres and

summarize them as follows

Book ID 01 02 03 04 05 06 07 08 09 10
Thickness
50 60 66 68 71 72 75 82 90 99
(mm)

We’d like to group the books into two groups according to their thickness.

1) Assume we pick book 03 as the initial centroid for Group A, book 07 for
Group B, and assign the books to the two groups using Euclidean distance.
Before updating the new centroid, we will have _BLANK1_ books in Group A
(please enter an integer here)
EE2211 - Introduction to Machine Learning/ Page 14

2) With the above group assignments, we re-estimate the new centroids for the
two groups. The new centroid of Group A is _BLANK2_ , and the new
centroid of Group B is _BLANK3_ . ( please specify your answer with two
decimal places)

3) With updated centroid, we run group assignment again. In this second

round, the number of books in the Group A (the group with lower centroid) is
_BLANK4_.

[4 Blanks]

END OF PAPER

MTH501 by (Samran Haider)-293-524
No ratings yet
MTH501 by (Samran Haider)-293-524
232 pages
Linear Algebra - Benjamin McKay
100% (1)
Linear Algebra - Benjamin McKay
491 pages
Nptel ML Questions
No ratings yet
Nptel ML Questions
12 pages
Linear Algebra MODULE 1 Linear Equations and Matrices
No ratings yet
Linear Algebra MODULE 1 Linear Equations and Matrices
7 pages
QCM
No ratings yet
QCM
24 pages
1751408453720
No ratings yet
1751408453720
47 pages
Matrices - Determinant - JEE (Main) - 2024
100% (1)
Matrices - Determinant - JEE (Main) - 2024
143 pages
MathsI Unit 3 21-22-Handout
No ratings yet
MathsI Unit 3 21-22-Handout
65 pages
INEQUALITIES
100% (1)
INEQUALITIES
53 pages
R22 B.tech (CSE) Course Structure and Contents
No ratings yet
R22 B.tech (CSE) Course Structure and Contents
182 pages
Computer Science
No ratings yet
Computer Science
57 pages
EE2211 Past Paper Ans
No ratings yet
EE2211 Past Paper Ans
19 pages
Aml Mid-2 Objective
No ratings yet
Aml Mid-2 Objective
17 pages
Systems of Linear Equations by Elimination
No ratings yet
Systems of Linear Equations by Elimination
15 pages
Giant Pile ML Problems
No ratings yet
Giant Pile ML Problems
56 pages
Math 203-1.6
No ratings yet
Math 203-1.6
48 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
SAT Suite Question Bank Today - Results
No ratings yet
SAT Suite Question Bank Today - Results
33 pages
ACCE B SC Syllabus 2013-2014
No ratings yet
ACCE B SC Syllabus 2013-2014
62 pages
MACHINE LEARNING QUESTION BANK (M1 M2 M3)
No ratings yet
MACHINE LEARNING QUESTION BANK (M1 M2 M3)
16 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
This Sheet Is For 1 Mark Questions S.R No
No ratings yet
This Sheet Is For 1 Mark Questions S.R No
56 pages
3rd_data(1) (1)
No ratings yet
3rd_data(1) (1)
18 pages
Modified Study Material of Module 4
No ratings yet
Modified Study Material of Module 4
32 pages
This Sheet Is For 1 Mark Questions S.R No
No ratings yet
This Sheet Is For 1 Mark Questions S.R No
63 pages
This Sheet Is For 1 Mark Questions S.R No
100% (1)
This Sheet Is For 1 Mark Questions S.R No
69 pages
212 Final-Solution
No ratings yet
212 Final-Solution
23 pages
finals19
No ratings yet
finals19
16 pages
Data Science
No ratings yet
Data Science
35 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Final 2019
No ratings yet
Final 2019
15 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
ML (1)
No ratings yet
ML (1)
6 pages
ml-20230316-1
No ratings yet
ml-20230316-1
9 pages
Algorithm 533 NSPIV, A Fortran Subroutine For Sparse Gaussian Elimination With Partial Pivoting (F4)
No ratings yet
Algorithm 533 NSPIV, A Fortran Subroutine For Sparse Gaussian Elimination With Partial Pivoting (F4)
8 pages
midterm2008f_sol
No ratings yet
midterm2008f_sol
12 pages
BE - 3110014 - Mathematics 1 - Tutorial - 2023 - 24
100% (1)
BE - 3110014 - Mathematics 1 - Tutorial - 2023 - 24
19 pages
Machine Learning,( CS-3035), Online Spring End Semester Examination 2021
No ratings yet
Machine Learning,( CS-3035), Online Spring End Semester Examination 2021
8 pages
Algebra Review Packet
No ratings yet
Algebra Review Packet
18 pages
Sheets-BAS111 Numerical Analysis-Part
No ratings yet
Sheets-BAS111 Numerical Analysis-Part
6 pages
MTP290 Tut4
No ratings yet
MTP290 Tut4
3 pages
quiz-1
No ratings yet
quiz-1
3 pages
IT_ML copy
No ratings yet
IT_ML copy
10 pages
Ottocento Lesson #6 Linear Systems How Many Solutions Worksheet
No ratings yet
Ottocento Lesson #6 Linear Systems How Many Solutions Worksheet
6 pages
ISE 529 mock test answers
No ratings yet
ISE 529 mock test answers
6 pages
Chapter 3,4 For Engineering
No ratings yet
Chapter 3,4 For Engineering
24 pages
Final Exam: CS 189 Spring 2020 Introduction To Machine Learning
No ratings yet
Final Exam: CS 189 Spring 2020 Introduction To Machine Learning
19 pages
3rd Sem CSE Syllabus
No ratings yet
3rd Sem CSE Syllabus
9 pages
Huawei Final Written Exam
50% (2)
Huawei Final Written Exam
18 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
ml-20240315
No ratings yet
ml-20240315
8 pages
quiz2_B
No ratings yet
quiz2_B
6 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
MID_SEM_QP_2024_MARCH_final
No ratings yet
MID_SEM_QP_2024_MARCH_final
4 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
LinearAlgebra and Its' Applications Notes - Chapter - 1.1
No ratings yet
LinearAlgebra and Its' Applications Notes - Chapter - 1.1
7 pages
ML Questions
No ratings yet
ML Questions
6 pages
MA2001 Exercises1
No ratings yet
MA2001 Exercises1
7 pages
Pair of Linear Equations in Two Variables - Chapter End Worksheet
No ratings yet
Pair of Linear Equations in Two Variables - Chapter End Worksheet
9 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Machine Learning QB_copy
No ratings yet
Machine Learning QB_copy
5 pages
2m
No ratings yet
2m
4 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
1.6 - Linear System and Inverse Matrix
No ratings yet
1.6 - Linear System and Inverse Matrix
14 pages
MCQs Dumps 2
No ratings yet
MCQs Dumps 2
15 pages
ML Week7 Soln
No ratings yet
ML Week7 Soln
3 pages
Silabo Algebra Lineal Aplicada 2018
No ratings yet
Silabo Algebra Lineal Aplicada 2018
11 pages
Nptel Week 5
No ratings yet
Nptel Week 5
4 pages
MLRECT2 Solution
No ratings yet
MLRECT2 Solution
9 pages
MLPUE1 Solution
No ratings yet
MLPUE1 Solution
9 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Wa0006.
No ratings yet
Wa0006.
4 pages
W3 Lesson 3 - Systems of Linear Equation - Module
No ratings yet
W3 Lesson 3 - Systems of Linear Equation - Module
11 pages
Objective Set - 2
No ratings yet
Objective Set - 2
2 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
ML MID-1 Question Bank
No ratings yet
ML MID-1 Question Bank
6 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
Numerical Analysis - 9th Edition - Ch 6.1 - P 12E
No ratings yet
Numerical Analysis - 9th Edition - Ch 6.1 - P 12E
3 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Machine Learning - AKTU PAPER (Session 2019 - 2020)
No ratings yet
Machine Learning - AKTU PAPER (Session 2019 - 2020)
10 pages
MLfinal 1
No ratings yet
MLfinal 1
7 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Tutorial 1: Instructions
No ratings yet
Tutorial 1: Instructions
2 pages
Tues - Opening - Math Guided Interactive Math Notebook Page Solutions To Equations
No ratings yet
Tues - Opening - Math Guided Interactive Math Notebook Page Solutions To Equations
6 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
Hexagon Number Sense
From Everand
Hexagon Number Sense
Christopher Casey
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)

EE2211_Past_Paper

Uploaded by

EE2211_Past_Paper

Uploaded by

NATIONAL UNIVERSITY OF SINGAPORE

EE2211 - Introduction to Machine Learning

Time Allowed : 2 Hours

1. This assessment paper contains Three Types of questions: True/False Questions,

True/False Questions (1.5 marks * 16 Questions)

4. The One Hot Encoding (OHE) can be applied to a binary target.

5. The polynomial model can approximate any continuous real-valued function on a

14. In the naïve K-means method, the optimal K is learned automatically.

15. Confusion matrix is a popular metric for evaluating regression tasks.

Multiple Option Questions (1.5 marks * 16 Questions)

4. This question is related to the understanding of linear systems and partial

a) In under-determined linear systems, the number of parameters is greater than

a) A set of linear equations is written as 𝐗𝐰 = 𝐲 where 𝐗 ∊ 𝓡*×+ is the input data

b) The ridge regression can be applied to multi-target regression.

d) Given four samples of two-dimensional data points 𝐗 and the corresponding

e) None of the others.

c) If we would like to perform multi-class classification using the optimal second-

e) None of the others.

9. Consider the cost function C(w) = DataLoss(w) + λRegularization(w). Assume that

10. Consider the cost function C(w) = DataLoss(w) + λRegularization(w). When λ = 3,

(2) Decrease maximum depth of tree

(a) ROC curve is a widely used evaluation metric.

(a) If we build a multilayer perceptron network, we do not need non-linear activation

x1 3.3459 1.0893 3.2103 1.744 1.6762

x1 0.9478 1.4931 2.5809 2.0607

Our goal is to train a quadratic regression model to predict y. Suppose we utilize

Now we have three classifier candidates, where their average training,

Classifiers Average Training Average Number of

In an unseen dataset of 100 emails, the classifier achieves the following

Spam (Predicted) Not Spam (Predicted)

The number x is _BLANK1_.

Assume we have the following cost matrix

We have a collect of 10 books. We measure their thickness in millimetres and

3) With updated centroid, we run group assignment again. In this second

You might also like