0% found this document useful (0 votes)

3 views8 pages

ST3189 2022 paper

The document provides instructions for an online assessment for the ST3189 Machine Learning course, scheduled for May 26, 2022. Candidates are required to complete a closed-book take-home exam consisting of four questions within a 3-hour window, with an expected effort of 2 hours. The document emphasizes academic integrity, outlining assessment offences and submission guidelines.

Uploaded by

kritisheel2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views8 pages

ST3189 2022 paper

Uploaded by

kritisheel2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ST3189

BSc DEGREES AND GRADUATE DIPLOMAS IN ECONOMICS, MANAGEMENT,

FINANCE AND THE SOCIAL SCIENCES, THE DIPLOMA IN ECONOMICS AND
SOCIAL SCIENCES AND THE CERTIFICATE IN EDUCATION IN SOCIAL
SCIENCES

Summer 2022 Online Assessment Instructions

ST3189 Machine learning

Thursday 26 May 2022: 09:00 - 12:00 (BST)

The assessment will be a closed-book take-home online assessment within

a 3-hour window. The expected time/effort to answer all questions is 2 hours.

Candidates should answer all FOUR questions. All questions carry equal
marks.

A table of common distributions is provided after the final question of this paper.

You should complete this paper using pen and paper. Please use BLACK INK
only.

Handwritten work then needs to be scanned, converted to PDF and then

uploaded to the exam platform as ONE individual file. Please ensure that your
candidate number is written clearly at the top of each page included in the
scan.Please do not write your name anywhere on your submission.

Workings should be submitted for all questions requiring calculations. Any

necessary assumptions introduced in answering a question are to be stated.

You may use any calculator for any appropriate calculations, but you may not use
any computer software to obtain solutions. Credit will only be given if all workings
are shown.

You have until 12:00 (BST) on Thursday 26 May 2022 to submit your answers.
However, you are advised not to leave your submission to the last minute in order to
allow sufficient time to submit your work.

If you think there is any information missing or any error in any question, then
you should indicate this but proceed to answer the question stating any
assumptions you have made.

© University of London 2022

The assessment has been designed with a duration of 3 hours to provide a more
flexible window in which to complete the assessment. As a closed-book exam, the
expected amount of effort required to complete all questions is no more than 2
hours. Organise your time well. You are assured that in terms of answering all
questions, there will be no benefit in you going beyond the expected 2 hours of
effort. Your assessment has been carefully designed to help you show what you
have learned in the hours allocated.

By accessing this question paper, you agree not to commit any assessment offence.
Assessment offences include (but are not limited to) committing plagiarism and the
use or access of any paid-for or any other services offering live assistance
during an examination. You must not confer with anyone else during a live
examination; and we take conferring to include any exchange of information or
discussion about the assessment with others in any way that could potentially give
you or another student an advantage in the examination. As such, any exchanging
with others of exam questions; or any accessing of websites, blogs, forums or any
other form of oral or written communication with others which involves any
discussion of live examination questions or potential answers/solutions to
exam questions will be considered an assessment offence.

The University of London will conduct checks to ensure the academic integrity of
your work. Many students that break the University of London’s assessment
regulations did not intend to cheat but did not properly understand the University of
London’s regulations on referencing and plagiarism. The University of London
considers all forms of plagiarism, whether deliberate or otherwise, a very
serious matter and can apply severe penalties that might impact on your
award.

The University of London’s Procedure for the Consideration of Allegations of

Assessment Offences is available online at:

Assessment Offence Procedures - University of London

© University of London 2022

ST3189
ST3189 Machine learning

Candidates should answer all FOUR questions. All questions

carry equal marks.

A table of common distributions is provided after the final question

of this paper.

Please find questions on the following page.

© University of London 2022

Answer all parts of the following questions.
An appendix with properties of common distributions is provided at the end.

1. (a) The lasso and best subset selection can be used for variable selection. Discuss
the main advantage and disadvantage of the lasso compared with best subset
selection. [4 marks]
(b) Consider the k-nearest neighbours classification using the Euclidean distance
on the dataset shown in Figure 1.
8

+ −
−
6

−
4

+
+
2

+ −
0

0 2 4 6 8

Figure 1: For Question 1 (b).

i. Sketch the 1-nearest neighbour decision boundary and identify regions

classified as “+” and “-”, respectively. [6 marks]
ii. What is the Leave-One-Out Cross Validation (LOOCV) error when using
3-nearest neighbours? [3 marks]
iii. What is the LOOCV error when using 5-nearest neighbours? [3 marks]
(c) Indicate whether the following statements are true or false. Briefly justify your
answers.
i. If the sensitivity of a classifier increases, so does its specificity. [3 marks]
ii. Quadratic discriminant analysis can only produce a quadratic decision
boundary. [3 marks]
iii. If we train a linear regression estimator on only half the data, the variance
of the estimator will be larger than training it on the entire dataset.
[3 marks]
2. Consider a linear regression setting where the response variable is y = (y1 , . . . , yn )
and there is one feature, or else predictor, x = (x1 , . . . , xn ), where xi > 0 for all
i = 1, ..., n. We are interested in fitting the following model
√
yi = β xi + i , i = 1, . . . , n,

where the error terms i ’s are independent and distributed according to the Normal
distribution with mean 0 and known variance σ 2 . Equivalently, we can write that
given x each yi is independent and distributed according to the Normal distribution
√
with mean β xi and known variance σ 2 .

(a) Derive the likelihood function for the unknown parameter β. [3 marks]
(b) Derive the Jeffreys prior for β. Use it to obtain the corresponding posterior
distribution. [6 marks]
(c) Consider the Normal distribution prior for β with zero mean and variance ω 2 .
Use it to obtain the corresponding posterior distribution. [6 marks]
(d) Consider the least squares criterion
n
X √
(yi − β xi )2 , (1)
i=1

and show that the estimator of β that minimises equation (1), also maximises
the likelihood function derived in part (a). Derive this estimator and, in
addition, consider the following penalised least squares criterion
( n )
X √ 2
(yi − β xi ) + λβ 2 , (2)
i=1

given a λ > 0. Derive the estimator of β that minimises equation (2) and
compare it with the one that minimises equation (1). [5 marks]
(e) Provide a Bayes estimator for each of the posteriors in parts (b) and (c) and
compare them with the estimators of part (d). [5 marks]
3. (a) Consider the regression task of predicting the variable Y based on the variable
X given the following training sample:
Y X
7 8
6 9
8 7
3 1
4 0
Apply the recursive binary splitting algorithm to produce a regression tree.
The objective is to minimise the residual sum of squares (RSS)
X X
RSS = (Yi − cm )2 ,
m i:i∈Rm

where cm is the prediction for Yi corresponding to the region Rm of the tree.

The stopping criterion, in order to find the regions Rm of the tree, requires all
nodes to have less than 4 observations. Provide the splitting rules, the regions
Rm and a diagram of the tree as well as your calculations in detail.
[13 marks]

(b) Suppose we wish to perform k-means clustering with k = 2 on the following

data set containing five observations and one variable: X = (−3, −4, 2, 3, 5).
Suppose that our random initialisation ends up with two cluster centres at the
following locations: Cluster Centre 1: X = 1; Cluster Centre 2: X = 4.
i. Show how the k-means algorithm will work from this point on. You need
to indicate what the initial cluster assignments will be, how the cluster
centres and assignments change at each step, as well as the final cluster
assignments and centres. Note that you should only need to do this for a
few iterations before you get the final solution. [8 marks]
ii. What would happen in the k-means algorithm if the observation 2 was
actually recorded wrong and its correct value was 1? [4 marks]
4. (a) Suppose that we have five observed points, each with four features. We present
the Euclidean distance between any two observations with measurements on
these four features in the following matrix.
1 2 3 4 5
1 0.00 0.90 0.16 0.45 0.60
2 0.90 0.00 0.55 0.50 0.04
3 0.16 0.55 0.00 0.57 0.35
4 0.45 0.50 0.57 0.00 0.30
5 0.60 0.04 0.35 0.30 0.00
Use the matrix with Euclidean distances to perform hierarchical clustering,
using simple linkage. [13 marks]
(b) Assume that we take a data set, divide it into equally-sized training and test
sets, and then try out two different classification procedures. First we use linear
discriminant analysis and get an error rate of 20% on the training data and
15% on the test data. Next we use 1-nearest neighbours (i.e. k = 1) and get
an average error rate (averaged over both test and training data sets) of 10%.
Based on these results, which method should we prefer to use for classification
of new observations? Why? [6 marks]
(c) Consider the following binary classification problem with Y = k, k ∈ {1, 2}.
At a data point x, P (Y = 1|X = x) = 0.4. Let x0 be the nearest neighbour
of x and P (Y = 1|X = x0 ) = p > 0. What are the values of p such that the
1-neighbour error at x is at least 0.5? [6 marks]
Appendix: Table of Common Distributions

Binomial(n, θ): number of successes in n independent Bernoulli trials with probability of suc-
cess θ.

n! x
• f (x|θ) = P (x|θ) = x!(n−x)! θ (1 − θ)n−x for x = 0, 1, . . . , n.

• E(X) = nθ, Var(X) = nθ(1 − θ).

NegBin(r, θ): number of successes before rth failures in repeated independent Bernoulli trials.

x+r−1
θx (1 − θ)r

• f (x|θ) = P (x|θ) = x for x = 0, 1, . . ..
r(1−θ) r(1−θ)
• E(X) = θ , Var(X) = θ2
.

Poisson(λ): often used for the number of events which occur in an interval of time.

λx e−λ
• f (x|λ) = P (x|λ) = x! for x = 0, 1, . . ..
• E(X) = λ, Var(X) = λ.

Normal N(µ, σ 2 ): characterized by first two moments.

2

• f (x) = (2πσ 2 )−1/2 exp − (x−µ)
2σ 2
for −∞ < x < ∞.

• E(X) = µ, Var(X) = σ 2 .

Beta(α, β): characterized by parameters α > 0 and β > 0.

1 α−1 (1 − x)β−1
R1 Γ(α)Γ(β)
• f (x) = B(α,β) x for 0 ≤ x ≤ 1, B(α, β) = 0 y α−1 (1 − y)β−1 dy = Γ(α+β)

α αβ
• E(X) = α+β , Var(X) = (α+β+1)(α+β)2
.

Gamma(α, β): characterized by parameters α > 0 and β > 0.

β α α−1 R∞
• f (x) = Γ(α) x exp(−βx) for 0 ≤ x < ∞, Γ(t) = 0 y t−1 e−y dy.

• E(X) = αβ , Var(X) = α
β2
.

IGamma(α, β): characterized by parameters α > 0 and β. If X ∼ Gamma(α, β), then 1/X ∼
IGamma(α, β).

β α −α−1
• f (x) = Γ(α) x exp − βx for 0 ≤ x < ∞.

β β2
• E(X) = α−1 , Var(X) = (α−1)2 (α−2)
. for positive integer n.

END OF PAPER

Ups From SCM
No ratings yet
Ups From SCM
610 pages
2022 EQP and Commentaries
No ratings yet
2022 EQP and Commentaries
33 pages
ebook_m346_sep2(11B-15J)_e2i1_web026093_l3 xxxx
No ratings yet
ebook_m346_sep2(11B-15J)_e2i1_web026093_l3 xxxx
22 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
Gfk2222y Rx3i Rx7i Cpu Ref Manual
No ratings yet
Gfk2222y Rx3i Rx7i Cpu Ref Manual
288 pages
M346 paper 2011
No ratings yet
M346 paper 2011
24 pages
Keyboard Operating Manual V1.2 201105
No ratings yet
Keyboard Operating Manual V1.2 201105
34 pages
JNUEE-PhD-Computer-Systems-Sciences-SCSHgf
No ratings yet
JNUEE-PhD-Computer-Systems-Sciences-SCSHgf
19 pages
1728809857 2019 General Mathematics Examination Paper
No ratings yet
1728809857 2019 General Mathematics Examination Paper
25 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
IandF_CS2A_201904_Examiner Report
No ratings yet
IandF_CS2A_201904_Examiner Report
18 pages
hw2
No ratings yet
hw2
10 pages
Final Compre - Solutions - updated FoDS
No ratings yet
Final Compre - Solutions - updated FoDS
12 pages
Sample Paper 2
No ratings yet
Sample Paper 2
5 pages
Comp_Control_Worksheet- Template
No ratings yet
Comp_Control_Worksheet- Template
4 pages
Sample Paper 3
No ratings yet
Sample Paper 3
4 pages
Final MAEC_Semester_4_English_2024-25
No ratings yet
Final MAEC_Semester_4_English_2024-25
15 pages
IML21_Term1
No ratings yet
IML21_Term1
7 pages
CS-30004(DSA)-CS_END_NOV_2024
No ratings yet
CS-30004(DSA)-CS_END_NOV_2024
17 pages
MT2116 Enc 3
No ratings yet
MT2116 Enc 3
21 pages
[EXAMP]A12222W1_2022_2023_T
No ratings yet
[EXAMP]A12222W1_2022_2023_T
4 pages
[EXAMP]A12222W1_2023_2024_T07
No ratings yet
[EXAMP]A12222W1_2023_2024_T07
4 pages
MN2032 2022 May
No ratings yet
MN2032 2022 May
10 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
fmri expirement
No ratings yet
fmri expirement
9 pages
22EE514
No ratings yet
22EE514
6 pages
Summative Assessment
No ratings yet
Summative Assessment
31 pages
GTX 770 User Guide PDF
No ratings yet
GTX 770 User Guide PDF
35 pages
MT2175 2022 ZA Paper
No ratings yet
MT2175 2022 ZA Paper
5 pages
LHM Machine Learning and Intelligent Data Analysis 2022-23
No ratings yet
LHM Machine Learning and Intelligent Data Analysis 2022-23
6 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
MT2175 2021 ZA Paper
No ratings yet
MT2175 2021 ZA Paper
4 pages
Zeppelin Air 30pin Connectivity
No ratings yet
Zeppelin Air 30pin Connectivity
9 pages
Ec220 IRDAP ST2021
No ratings yet
Ec220 IRDAP ST2021
7 pages
COMP0199A5UF
No ratings yet
COMP0199A5UF
4 pages
How Information Gives You Competitive Advantage
No ratings yet
How Information Gives You Competitive Advantage
5 pages
Horus Datasheet
No ratings yet
Horus Datasheet
2 pages
ST3189 - Machine Learning - 2019 Exam - Zone-B
No ratings yet
ST3189 - Machine Learning - 2019 Exam - Zone-B
6 pages
ST3189 Exam Paper - October 2023
No ratings yet
ST3189 Exam Paper - October 2023
5 pages
Odoo Brouchure
No ratings yet
Odoo Brouchure
8 pages
Exam 2 2223
No ratings yet
Exam 2 2223
4 pages
Machine learning
No ratings yet
Machine learning
3 pages
AI Final Spring 2021
No ratings yet
AI Final Spring 2021
3 pages
ECOM165 jan 2022
No ratings yet
ECOM165 jan 2022
3 pages
Quiz3_2024
No ratings yet
Quiz3_2024
2 pages
CH 24 HW
No ratings yet
CH 24 HW
23 pages
Project Management Using Earned Value Analysis
No ratings yet
Project Management Using Earned Value Analysis
8 pages
Rochelle Subaan - 3rd Assessment
No ratings yet
Rochelle Subaan - 3rd Assessment
2 pages
ECON 322 ECONOMETRICS 11 - Kabarak
No ratings yet
ECON 322 ECONOMETRICS 11 - Kabarak
6 pages
WEB-Robot Building 4-5 (English)
No ratings yet
WEB-Robot Building 4-5 (English)
131 pages
EC2066 ZB Final For UoL
No ratings yet
EC2066 ZB Final For UoL
6 pages
Fullpapers Biometrikd8bc041810full PDF
No ratings yet
Fullpapers Biometrikd8bc041810full PDF
9 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Homework 2 Digital Integrated Circuit Design: N-Channel MOSFET EQUATION
No ratings yet
Homework 2 Digital Integrated Circuit Design: N-Channel MOSFET EQUATION
3 pages
Ex08 PDF
No ratings yet
Ex08 PDF
29 pages
Sample Exam Questions
No ratings yet
Sample Exam Questions
5 pages
Sound Buttons Large Collection of Sounds
No ratings yet
Sound Buttons Large Collection of Sounds
1 page
BEC-3352-ECONOMETRICS-II-
No ratings yet
BEC-3352-ECONOMETRICS-II-
3 pages
Precision Cat 2
No ratings yet
Precision Cat 2
12 pages
Ma629 2007
No ratings yet
Ma629 2007
5 pages
Adobe Scan 04-Dec-2023
No ratings yet
Adobe Scan 04-Dec-2023
7 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
4311668368487
No ratings yet
4311668368487
9 pages
Mac vs. PC
No ratings yet
Mac vs. PC
4 pages
Bock Whatsapp
100% (1)
Bock Whatsapp
4 pages
BMS 6103 ECONOMETRICS_Set A
No ratings yet
BMS 6103 ECONOMETRICS_Set A
2 pages
ML MCQ 1
No ratings yet
ML MCQ 1
5 pages
Mil DTL 15514G
No ratings yet
Mil DTL 15514G
25 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
3 pages
Math5335 2018
No ratings yet
Math5335 2018
5 pages
Plastic Design Multistory Frames
100% (1)
Plastic Design Multistory Frames
290 pages
Math5335 2019
No ratings yet
Math5335 2019
5 pages
Sample Final AI
No ratings yet
Sample Final AI
9 pages
Sample Placement Papers: 01 Coding For Product Development Companies 06 Functions 30 90 Minutes
No ratings yet
Sample Placement Papers: 01 Coding For Product Development Companies 06 Functions 30 90 Minutes
3 pages
Interactive Media Authoring Systems: Book Chapter
No ratings yet
Interactive Media Authoring Systems: Book Chapter
2 pages
Karthick - Raja S Resume
No ratings yet
Karthick - Raja S Resume
2 pages
University of Edinburgh College of Science and Engineering School of Informatics
No ratings yet
University of Edinburgh College of Science and Engineering School of Informatics
5 pages
ECON1203-2292 Final Exam S212 PDF
No ratings yet
ECON1203-2292 Final Exam S212 PDF
13 pages
ST104A 03 June
No ratings yet
ST104A 03 June
21 pages
HW 1
No ratings yet
HW 1
3 pages
Improving and Measuring Cache Performance
No ratings yet
Improving and Measuring Cache Performance
8 pages
Stress Ribbon and Cable Supported Pedestrian Bridges - Prof. Ing. Jiří Stráský
100% (1)
Stress Ribbon and Cable Supported Pedestrian Bridges - Prof. Ing. Jiří Stráský
40 pages
Akhilesh CV
No ratings yet
Akhilesh CV
1 page
MAST90083 2021 S2 Exam Paper
No ratings yet
MAST90083 2021 S2 Exam Paper
4 pages
Triggered Voltage Vs Sphere Gap Distance
100% (1)
Triggered Voltage Vs Sphere Gap Distance
8 pages
IR8831 LTE Roaming
100% (1)
IR8831 LTE Roaming
36 pages
E - Waste Companies
No ratings yet
E - Waste Companies
4 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

ST3189 2022 paper

Uploaded by

ST3189 2022 paper

Uploaded by

ST3189

BSc DEGREES AND GRADUATE DIPLOMAS IN ECONOMICS, MANAGEMENT,

Summer 2022 Online Assessment Instructions

ST3189 Machine learning

Thursday 26 May 2022: 09:00 - 12:00 (BST)

The assessment will be a closed-book take-home online assessment within

Handwritten work then needs to be scanned, converted to PDF and then

Workings should be submitted for all questions requiring calculations. Any

© University of London 2022

The University of London’s Procedure for the Consideration of Allegations of

Assessment Offence Procedures - University of London

© University of London 2022

Candidates should answer all FOUR questions. All questions

A table of common distributions is provided after the final question

Please find questions on the following page.

© University of London 2022

Figure 1: For Question 1 (b).

i. Sketch the 1-nearest neighbour decision boundary and identify regions

where cm is the prediction for Yi corresponding to the region Rm of the tree.

(b) Suppose we wish to perform k-means clustering with k = 2 on the following

• E(X) = nθ, Var(X) = nθ(1 − θ).

Normal N(µ, σ 2 ): characterized by first two moments.

Beta(α, β): characterized by parameters α > 0 and β > 0.

Gamma(α, β): characterized by parameters α > 0 and β > 0.

You might also like