0% found this document useful (0 votes)
168 views

Midterm - APS1070 - 2019 - 09 Fall

This document contains a 5 question midterm exam for a course on foundations of data analytics and machine learning. The questions cover topics like K-nearest neighbors classification, correlation analysis of datasets, vector and matrix operations, and evaluating classification models on imbalanced data. Students are provided with diagrams, datasets, and matrices to analyze and are asked multiple choice and short answer questions testing their understanding of fundamental machine learning concepts.

Uploaded by

Michael Ye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views

Midterm - APS1070 - 2019 - 09 Fall

This document contains a 5 question midterm exam for a course on foundations of data analytics and machine learning. The questions cover topics like K-nearest neighbors classification, correlation analysis of datasets, vector and matrix operations, and evaluating classification models on imbalanced data. Students are provided with diagrams, datasets, and matrices to analyze and are asked multiple choice and short answer questions testing their understanding of fundamental machine learning concepts.

Uploaded by

Michael Ye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

APS1070 – Foundations of Data Analytics and Machine Learning

Midterm Examination Fall 2019

Open book
Non-programmable & non-communicating calculators are allowed
Time allotted: 90 minutes

1. We discussed K-Nearest Neighbour Classification (k-NN) in class, a simple and


intuitive way of classifying data.

a) Here are data points plotted in 2D space:

What is the predicted class of a new data point at x = 5, y = 5, using a K-


NN classifier and Euclidian distance with k = 3? (“ ”, “ ” or “ ”) [2]

b) In the dataset above, what is the predicted class of a new data point at
x = 11, y = 7, using Manhattan distance, for k = 5? (“ ”, “ ” or “ ”) [2]

c) In general, if k is increased, which of the following statements is correct [2]:

i. The K-NN decision boundary is smoothed and the noise


sensitivity is increased.
ii. The K-NN decision boundary is jagged and the noise sensitivity is
increased.
iii. The K-NN decision boundary is smoothed and the noise
sensitivity is decreased.
iv. The K-NN decision boundary is jagged and the noise sensitivity is
decreased.

APS1070 Fall 2019 Page 1 of 2


d) In general, if you build a k-NN classifier that achieves high accuracy on
training data, but gets poor accuracy on test data, which of the following
statements is most likely correct? [2]

i. The model is overfitting.


ii. The model is underfitting.
iii. The model is neither overfitting nor underfitting.
iv. The model is both overfitting and underfitting.

2. Here are four scatterplots, each expressing the relation between two variables:

Rank the datasets A, B, C and D in terms of correlation coefficient, from lowest to


highest [2].

2 1
3. Here are two vectors x1 and x2: 𝑥1 = [ ] , 𝑥2 = [ ]
1 −2

a) Are x1 and x2 orthogonal? [2]

b) Calculate the norm of x1 and the norm of x2 [2]

c) Do x1 and x2 form an orthonormal basis for vector space R2? Why? [2]

4. Calculate the inverse of matrix A by gaussian elimination. [2]

1 1 0
𝐴 = [−1 0 0]
0 1 1

5. You build a classification model for cancer detection using an imbalanced training
dataset and achieve an accuracy of 97% when testing on new data. Explain how
this performance can be deceiving, and what performance metric(s) might be more
appropriate. [2]

APS1070 Fall 2019 Page 2 of 2

You might also like