Concordia University Machine Learning Assaignment with solutions
Concordia University Machine Learning Assaignment with solutions
K-means Algorithm
1. Let a configuration of the k means algorithm correspond to the k way partition (on the set of
instances to be clustered) generated by the clustering at the end of each iteration. Is it possible for the
kmeans algorithm to revisit a configuration? Justify how your answer proves that the k means algorithm
converges in a finite number of steps.
Answer:
Though the k means algorithm convergence rate if the k route segment does not change between
iterations, the k way partition must change between them. It is consequently difficult to revisit a
configuration as the mean squared error monotonically declines. As a result, the k means the
method will ultimately run out of configurations and converge.
2. Suppose you are given the following <x,y> pairs. You will simulate the k-means algorithm to identify
TWO clusters in the data.
Suppose you are given initial assignment cluster center as {cluster1: #1}, {cluster2: #10} – the first data
point is used as the first cluster center and the 10-th as the second cluster center. Please simulate the
kmeans (k=2) algorithm for ONE iteration. What are the cluster assignments after ONE iteration? Assume
k-means uses Euclidean distance. What are the cluster assignments until convergence? (Fill in the table
below)
Data # Cluster assignment after one Cluster assignment after
iteration convergence
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 2 2
7 2 2
8 2 1
9 2 2
10 2 2
Naïve Bayes Algorithm
Consider the problem of binary classification using the Naive Bayes classifier. You are given
twodimensional features (X1, X2) and the categorical class conditional distributions in the tables below.
The entries in the tables correspond to P(X1 = x1|Ci) and P(X2 = x2|Ci) respectively. The two classes are
equally likely.
Given a data point (−1, 1), calculate the following posterior probabilities:
1. P(C1|X1 = −1, X2 = 1)
2. P(C2|X1 = −1, X2 = 1)
A. Conditional Independence
B. Conditional Dependence
C. Both a and b
A. Categorical Values
B. Numerical Values
C. Either a or b
D. Both a and b
B. Probabilistic condition
C. Random Forest
A. n*L
B . O(n+L)
C. O(n*L)
D. O(n/L)
A. Categorical Values
B. Numerical Values
C. Either a or b
D. Both a and b
Logistic Regression
Assume that we have two possible conditional distributions (P(y = 1|x, w)) obtained by training a logistic
regression on the dataset shown in Figure 2: In the first case, the value of P(y = 1|x, w) is equal to 1/3 for
all the data points. In the second case, P(y = 1|x, w) is equal to zero for x = 1 and is equal to 1 for all other
data points. One of these conditional distributions is obtained by finding the maximum likelihood of the
parameter w. Which one is the MLE solution? Justify your answer in at most three sentences.
Solution:
The MLE solution is the first case where the value of P(y = 1|x; w) is equal to 1/3 for all the
data points
Figure 2
1. Logistic regression is used for ___?
A. classification
B. regression
C. clustering
D. All of these
2. Logistic Regression is a Machine Learning algorithm that is used to predict the probability of a ___?
A. categorical independent variable
B. categorical dependent variable.
C. numerical dependent variable.
D. numerical independent variable
3. You are predicting whether an email is spam or not. Based on the features, you obtained an estimated
probability to be 0.75. What’s the meaning of this estimated probability? (select two)
A. there is 25% chance that the email will be spam
B. there is 75% chance that the email will be spam
C. there is 75% chance that the email will not be spam
D. there is 25% chance that the email will not be spam
A. Linear regression uses mean squared error as its cost function. If this is used for logistic regression,
then it will be a non-convex function of its parameters. Gradient descent will converge into global
minimum only if the function is convex.
B. Linear regression uses mean squared error as its cost function. If this is used for logistic regression,
then it will be a convex function of its parameters. Gradient descent will converge into global minimum
only if the function is convex.
C. Linear regression uses mean squared error as its cost function. If this is used for logistic regression,
then it will be a non-convex function of its parameters. Gradient descent will converge into global
minimum only if the function is non-convex.
D. Linear regression uses mean squared error as its cost function. If this is used for logistic regression,
then it will be a convex function of its parameters. Gradient descent will converge into global minimum
only if the function is non-convex.
7. You are predicting whether an email is spam or not. Based on the features, you obtained an estimated
probability to be 0.75. What is the meaning of this estimated probability? The threshold to differ the
classes is 0.5.
A. The email is not spam
B. The email is spam
C. Can’t determine
D. both (A) and (B)