0% found this document useful (0 votes)
17 views

Concordia University Machine Learning Assaignment with solutions

Uploaded by

abdurrafiqit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Concordia University Machine Learning Assaignment with solutions

Uploaded by

abdurrafiqit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IT 593 (A) - Machine Learning Tools and Techniques

Assignment 2 (30 points)


Due date: 28 October, 2022 11:59 PM.

K-means Algorithm
1. Let a configuration of the k means algorithm correspond to the k way partition (on the set of
instances to be clustered) generated by the clustering at the end of each iteration. Is it possible for the
kmeans algorithm to revisit a configuration? Justify how your answer proves that the k means algorithm
converges in a finite number of steps.

Answer:

Though the k means algorithm convergence rate if the k route segment does not change between
iterations, the k way partition must change between them. It is consequently difficult to revisit a
configuration as the mean squared error monotonically declines. As a result, the k means the
method will ultimately run out of configurations and converge.

2. Suppose you are given the following <x,y> pairs. You will simulate the k-means algorithm to identify
TWO clusters in the data.
Suppose you are given initial assignment cluster center as {cluster1: #1}, {cluster2: #10} – the first data
point is used as the first cluster center and the 10-th as the second cluster center. Please simulate the
kmeans (k=2) algorithm for ONE iteration. What are the cluster assignments after ONE iteration? Assume
k-means uses Euclidean distance. What are the cluster assignments until convergence? (Fill in the table
below)
Data # Cluster assignment after one Cluster assignment after
iteration convergence
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 2 2
7 2 2
8 2 1
9 2 2
10 2 2
Naïve Bayes Algorithm
Consider the problem of binary classification using the Naive Bayes classifier. You are given
twodimensional features (X1, X2) and the categorical class conditional distributions in the tables below.
The entries in the tables correspond to P(X1 = x1|Ci) and P(X2 = x2|Ci) respectively. The two classes are
equally likely.

Given a data point (−1, 1), calculate the following posterior probabilities:

1. P(C1|X1 = −1, X2 = 1)

using Baye's Rule and conditional independence assumption of naive Bayes


P(C1|X1 = −1, X2 = 1) => [P(C1|X1 = -1, X2 = 1)P(C1)]/[P(X1 = -1, X2 = 1)]
=> [P(X1 = -1 |C1) P(X2 = 11C1) P(C1)] /[P(X1 = -1|C1)P(X2=1|C1)P(C1) + P(X1 = -
1|C2)P(X2=1|C2)P(C2)]
=> 0.1

2. P(C2|X1 = −1, X2 = 1)

For P(C2 x1 = -1, x2 = 1) :


P(C2 | x1 = -1, x2 = 1) => 1 - P(C1 | X1 = -1, x2 = 1)
=> 1 - 0.1
=> 0.9

Multiple choice Questions:

Naive Baye is?

A. Conditional Independence
B. Conditional Dependence

C. Both a and b

D. None of the above

Naive Bayes requires?

A. Categorical Values

B. Numerical Values

C. Either a or b

D. Both a and b

Spam Classification is an example for ?


A. Naive Bayes

B. Probabilistic condition

C. Random Forest

D. All the Above

Time complexity for Naive Bayes classifier for n feature, L classdata is

A. n*L

B . O(n+L)

C. O(n*L)

D. O(n/L)

In Naive Bayes Numerical variable must be binned and converted to ?

A. Categorical Values

B. Numerical Values

C. Either a or b

D. Both a and b
Logistic Regression

Assume that we have two possible conditional distributions (P(y = 1|x, w)) obtained by training a logistic
regression on the dataset shown in Figure 2: In the first case, the value of P(y = 1|x, w) is equal to 1/3 for
all the data points. In the second case, P(y = 1|x, w) is equal to zero for x = 1 and is equal to 1 for all other
data points. One of these conditional distributions is obtained by finding the maximum likelihood of the
parameter w. Which one is the MLE solution? Justify your answer in at most three sentences.

Solution:

The MLE solution is the first case where the value of P(y = 1|x; w) is equal to 1/3 for all the
data points

Figure 2
1. Logistic regression is used for ___?
A. classification
B. regression
C. clustering
D. All of these

2. Logistic Regression is a Machine Learning algorithm that is used to predict the probability of a ___?
A. categorical independent variable
B. categorical dependent variable.
C. numerical dependent variable.
D. numerical independent variable
3. You are predicting whether an email is spam or not. Based on the features, you obtained an estimated
probability to be 0.75. What’s the meaning of this estimated probability? (select two)
A. there is 25% chance that the email will be spam
B. there is 75% chance that the email will be spam
C. there is 75% chance that the email will not be spam
D. there is 25% chance that the email will not be spam

4. In a logistic regression model, the decision boundary can be ___.


A. linear
B. non-linear
C. both (A) and (B)
D. none of these

5. What’s the cost function of the logistic regression?


A. Sigmoid function
B. Logistic Function
C. both (A) and (B)
D. none of these
6. Why cost function, which has been used for linear regression, can’t be used for logistic regression?

A. Linear regression uses mean squared error as its cost function. If this is used for logistic regression,
then it will be a non-convex function of its parameters. Gradient descent will converge into global
minimum only if the function is convex.
B. Linear regression uses mean squared error as its cost function. If this is used for logistic regression,
then it will be a convex function of its parameters. Gradient descent will converge into global minimum
only if the function is convex.
C. Linear regression uses mean squared error as its cost function. If this is used for logistic regression,
then it will be a non-convex function of its parameters. Gradient descent will converge into global
minimum only if the function is non-convex.

D. Linear regression uses mean squared error as its cost function. If this is used for logistic regression,
then it will be a convex function of its parameters. Gradient descent will converge into global minimum
only if the function is non-convex.
7. You are predicting whether an email is spam or not. Based on the features, you obtained an estimated
probability to be 0.75. What is the meaning of this estimated probability? The threshold to differ the
classes is 0.5.
A. The email is not spam
B. The email is spam
C. Can’t determine
D. both (A) and (B)

8. What’s the the hypothesis of logistic regression?


A. to limit the cost function between 0 and 1
B. to limit the cost function between -1 and 1
C. to limit the cost function between -infinity and +infinity
D. to limit the cost function between 0 and +infinity

4. Which one is not true?


A. If we take the weighted sum of inputs as the output as we do in Linear Regression, the value can be
more than 1 but we want a value between 0 and 1. That’s why Linear Regression can’t be used for
classification tasks.
B. Logistic Regression is a generalized Linear Regression in the sense that we don’t output the weighted
sum of inputs directly, but we pass it through a function that can map any real value between 0 and 1.
C. The value of the sigmoid function always lies between 0 and 1
D. Logistic Regression is used to determine the value of a continuous dependent variable

5. In a logistic regression, if the predicted logit is 0, what’s the transformed probability? A. 0


B. 1
C. 0.5
D. 0.05

You might also like