0% found this document useful (0 votes)
3 views8 pages

ST3189 2022 paper

The document provides instructions for an online assessment for the ST3189 Machine Learning course, scheduled for May 26, 2022. Candidates are required to complete a closed-book take-home exam consisting of four questions within a 3-hour window, with an expected effort of 2 hours. The document emphasizes academic integrity, outlining assessment offences and submission guidelines.

Uploaded by

kritisheel2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views8 pages

ST3189 2022 paper

The document provides instructions for an online assessment for the ST3189 Machine Learning course, scheduled for May 26, 2022. Candidates are required to complete a closed-book take-home exam consisting of four questions within a 3-hour window, with an expected effort of 2 hours. The document emphasizes academic integrity, outlining assessment offences and submission guidelines.

Uploaded by

kritisheel2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ST3189

BSc DEGREES AND GRADUATE DIPLOMAS IN ECONOMICS, MANAGEMENT,


FINANCE AND THE SOCIAL SCIENCES, THE DIPLOMA IN ECONOMICS AND
SOCIAL SCIENCES AND THE CERTIFICATE IN EDUCATION IN SOCIAL
SCIENCES

Summer 2022 Online Assessment Instructions

ST3189 Machine learning

Thursday 26 May 2022: 09:00 - 12:00 (BST)

The assessment will be a closed-book take-home online assessment within


a 3-hour window. The expected time/effort to answer all questions is 2 hours.

Candidates should answer all FOUR questions. All questions carry equal
marks.

A table of common distributions is provided after the final question of this paper.

You should complete this paper using pen and paper. Please use BLACK INK
only.

Handwritten work then needs to be scanned, converted to PDF and then


uploaded to the exam platform as ONE individual file. Please ensure that your
candidate number is written clearly at the top of each page included in the
scan.Please do not write your name anywhere on your submission.

Workings should be submitted for all questions requiring calculations. Any


necessary assumptions introduced in answering a question are to be stated.

You may use any calculator for any appropriate calculations, but you may not use
any computer software to obtain solutions. Credit will only be given if all workings
are shown.

You have until 12:00 (BST) on Thursday 26 May 2022 to submit your answers.
However, you are advised not to leave your submission to the last minute in order to
allow sufficient time to submit your work.

If you think there is any information missing or any error in any question, then
you should indicate this but proceed to answer the question stating any
assumptions you have made.

© University of London 2022


The assessment has been designed with a duration of 3 hours to provide a more
flexible window in which to complete the assessment. As a closed-book exam, the
expected amount of effort required to complete all questions is no more than 2
hours. Organise your time well. You are assured that in terms of answering all
questions, there will be no benefit in you going beyond the expected 2 hours of
effort. Your assessment has been carefully designed to help you show what you
have learned in the hours allocated.

By accessing this question paper, you agree not to commit any assessment offence.
Assessment offences include (but are not limited to) committing plagiarism and the
use or access of any paid-for or any other services offering live assistance
during an examination. You must not confer with anyone else during a live
examination; and we take conferring to include any exchange of information or
discussion about the assessment with others in any way that could potentially give
you or another student an advantage in the examination. As such, any exchanging
with others of exam questions; or any accessing of websites, blogs, forums or any
other form of oral or written communication with others which involves any
discussion of live examination questions or potential answers/solutions to
exam questions will be considered an assessment offence.

The University of London will conduct checks to ensure the academic integrity of
your work. Many students that break the University of London’s assessment
regulations did not intend to cheat but did not properly understand the University of
London’s regulations on referencing and plagiarism. The University of London
considers all forms of plagiarism, whether deliberate or otherwise, a very
serious matter and can apply severe penalties that might impact on your
award.

The University of London’s Procedure for the Consideration of Allegations of


Assessment Offences is available online at:

Assessment Offence Procedures - University of London

© University of London 2022


ST3189
ST3189 Machine learning

Candidates should answer all FOUR questions. All questions


carry equal marks.

A table of common distributions is provided after the final question


of this paper.

Please find questions on the following page.

© University of London 2022


Answer all parts of the following questions.
An appendix with properties of common distributions is provided at the end.

1. (a) The lasso and best subset selection can be used for variable selection. Discuss
the main advantage and disadvantage of the lasso compared with best subset
selection. [4 marks]
(b) Consider the k-nearest neighbours classification using the Euclidean distance
on the dataset shown in Figure 1.
8

+ −

6


4

+
+
2

+ −
0

0 2 4 6 8

Figure 1: For Question 1 (b).

i. Sketch the 1-nearest neighbour decision boundary and identify regions


classified as “+” and “-”, respectively. [6 marks]
ii. What is the Leave-One-Out Cross Validation (LOOCV) error when using
3-nearest neighbours? [3 marks]
iii. What is the LOOCV error when using 5-nearest neighbours? [3 marks]
(c) Indicate whether the following statements are true or false. Briefly justify your
answers.
i. If the sensitivity of a classifier increases, so does its specificity. [3 marks]
ii. Quadratic discriminant analysis can only produce a quadratic decision
boundary. [3 marks]
iii. If we train a linear regression estimator on only half the data, the variance
of the estimator will be larger than training it on the entire dataset.
[3 marks]
2. Consider a linear regression setting where the response variable is y = (y1 , . . . , yn )
and there is one feature, or else predictor, x = (x1 , . . . , xn ), where xi > 0 for all
i = 1, ..., n. We are interested in fitting the following model

yi = β xi + i , i = 1, . . . , n,

where the error terms i ’s are independent and distributed according to the Normal
distribution with mean 0 and known variance σ 2 . Equivalently, we can write that
given x each yi is independent and distributed according to the Normal distribution

with mean β xi and known variance σ 2 .

(a) Derive the likelihood function for the unknown parameter β. [3 marks]
(b) Derive the Jeffreys prior for β. Use it to obtain the corresponding posterior
distribution. [6 marks]
(c) Consider the Normal distribution prior for β with zero mean and variance ω 2 .
Use it to obtain the corresponding posterior distribution. [6 marks]
(d) Consider the least squares criterion
n
X √
(yi − β xi )2 , (1)
i=1

and show that the estimator of β that minimises equation (1), also maximises
the likelihood function derived in part (a). Derive this estimator and, in
addition, consider the following penalised least squares criterion
( n )
X √ 2
(yi − β xi ) + λβ 2 , (2)
i=1

given a λ > 0. Derive the estimator of β that minimises equation (2) and
compare it with the one that minimises equation (1). [5 marks]
(e) Provide a Bayes estimator for each of the posteriors in parts (b) and (c) and
compare them with the estimators of part (d). [5 marks]
3. (a) Consider the regression task of predicting the variable Y based on the variable
X given the following training sample:
Y X
7 8
6 9
8 7
3 1
4 0
Apply the recursive binary splitting algorithm to produce a regression tree.
The objective is to minimise the residual sum of squares (RSS)
X X
RSS = (Yi − cm )2 ,
m i:i∈Rm

where cm is the prediction for Yi corresponding to the region Rm of the tree.


The stopping criterion, in order to find the regions Rm of the tree, requires all
nodes to have less than 4 observations. Provide the splitting rules, the regions
Rm and a diagram of the tree as well as your calculations in detail.
[13 marks]

(b) Suppose we wish to perform k-means clustering with k = 2 on the following


data set containing five observations and one variable: X = (−3, −4, 2, 3, 5).
Suppose that our random initialisation ends up with two cluster centres at the
following locations: Cluster Centre 1: X = 1; Cluster Centre 2: X = 4.
i. Show how the k-means algorithm will work from this point on. You need
to indicate what the initial cluster assignments will be, how the cluster
centres and assignments change at each step, as well as the final cluster
assignments and centres. Note that you should only need to do this for a
few iterations before you get the final solution. [8 marks]
ii. What would happen in the k-means algorithm if the observation 2 was
actually recorded wrong and its correct value was 1? [4 marks]
4. (a) Suppose that we have five observed points, each with four features. We present
the Euclidean distance between any two observations with measurements on
these four features in the following matrix.
1 2 3 4 5
1 0.00 0.90 0.16 0.45 0.60
2 0.90 0.00 0.55 0.50 0.04
3 0.16 0.55 0.00 0.57 0.35
4 0.45 0.50 0.57 0.00 0.30
5 0.60 0.04 0.35 0.30 0.00
Use the matrix with Euclidean distances to perform hierarchical clustering,
using simple linkage. [13 marks]
(b) Assume that we take a data set, divide it into equally-sized training and test
sets, and then try out two different classification procedures. First we use linear
discriminant analysis and get an error rate of 20% on the training data and
15% on the test data. Next we use 1-nearest neighbours (i.e. k = 1) and get
an average error rate (averaged over both test and training data sets) of 10%.
Based on these results, which method should we prefer to use for classification
of new observations? Why? [6 marks]
(c) Consider the following binary classification problem with Y = k, k ∈ {1, 2}.
At a data point x, P (Y = 1|X = x) = 0.4. Let x0 be the nearest neighbour
of x and P (Y = 1|X = x0 ) = p > 0. What are the values of p such that the
1-neighbour error at x is at least 0.5? [6 marks]
Appendix: Table of Common Distributions

Binomial(n, θ): number of successes in n independent Bernoulli trials with probability of suc-
cess θ.

n! x
• f (x|θ) = P (x|θ) = x!(n−x)! θ (1 − θ)n−x for x = 0, 1, . . . , n.

• E(X) = nθ, Var(X) = nθ(1 − θ).

NegBin(r, θ): number of successes before rth failures in repeated independent Bernoulli trials.

x+r−1
θx (1 − θ)r

• f (x|θ) = P (x|θ) = x for x = 0, 1, . . ..
r(1−θ) r(1−θ)
• E(X) = θ , Var(X) = θ2
.

Poisson(λ): often used for the number of events which occur in an interval of time.

λx e−λ
• f (x|λ) = P (x|λ) = x! for x = 0, 1, . . ..
• E(X) = λ, Var(X) = λ.

Normal N(µ, σ 2 ): characterized by first two moments.


 2

• f (x) = (2πσ 2 )−1/2 exp − (x−µ)
2σ 2
for −∞ < x < ∞.

• E(X) = µ, Var(X) = σ 2 .

Beta(α, β): characterized by parameters α > 0 and β > 0.

1 α−1 (1 − x)β−1
R1 Γ(α)Γ(β)
• f (x) = B(α,β) x for 0 ≤ x ≤ 1, B(α, β) = 0 y α−1 (1 − y)β−1 dy = Γ(α+β)

α αβ
• E(X) = α+β , Var(X) = (α+β+1)(α+β)2
.

Gamma(α, β): characterized by parameters α > 0 and β > 0.

β α α−1 R∞
• f (x) = Γ(α) x exp(−βx) for 0 ≤ x < ∞, Γ(t) = 0 y t−1 e−y dy.

• E(X) = αβ , Var(X) = α
β2
.

IGamma(α, β): characterized by parameters α > 0 and β. If X ∼ Gamma(α, β), then 1/X ∼
IGamma(α, β).
 
β α −α−1
• f (x) = Γ(α) x exp − βx for 0 ≤ x < ∞.

β β2
• E(X) = α−1 , Var(X) = (α−1)2 (α−2)
. for positive integer n.

END OF PAPER

You might also like