0% found this document useful (0 votes)
12 views14 pages

Merging Result-Merged

This homework assignment involves building classification models using Naive Bayes and decision trees. It provides a dataset for each modeling task and asks students to: 1) Estimate probabilities and build a Naive Bayes model to classify data points from the given dataset. 2) Build a decision tree model for another classification dataset using a simplified CART algorithm. Students are asked to calculate subsets, Gini indices, and make predictions using the decision tree. 3) The assignment is due on May 30th, 2023 and is out of 100 total points. It asks students to include their code and figures in a single PDF submission.

Uploaded by

ellison0930
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

Merging Result-Merged

This homework assignment involves building classification models using Naive Bayes and decision trees. It provides a dataset for each modeling task and asks students to: 1) Estimate probabilities and build a Naive Bayes model to classify data points from the given dataset. 2) Build a decision tree model for another classification dataset using a simplified CART algorithm. Students are asked to calculate subsets, Gini indices, and make predictions using the decision tree. 3) The assignment is due on May 30th, 2023 and is out of 100 total points. It asks students to include their code and figures in a single PDF submission.

Uploaded by

ellison0930
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Homework Assignment 4

CSE 151A: Introduction to Machine Learning

Due: May 30th, 2023, 9:30am (Pacific Time)

Instructions: Please answer the questions below, attach your code in the document, and
insert figures to create a single PDF file. You may search information online but you will
need to write code/find solutions to answer the questions yourself.

Grade: out of 100 points

1 (40 points) Naı̈ve Bayes


In this question, we would like to build a Naı̈ve Bayes model for a classification task. Assume
there is a classification dataset S = {(x(i) , y (i) ), i = 1, ..., 8} where each data point (x, y)
contains a feature vector x = (x1 , x2 , x3 ); x1 , x2 , x3 2 {0, 1} and a ground-truth label y 2
{0, 1}. The dataset S can be read from the table below:

i x1 x2 x3 y
1 0 0 1 1
2 0 1 1 1
3 1 1 0 1
4 0 0 1 1
5 0 1 0 0
6 1 1 0 0
7 1 0 0 0
8 0 0 1 0

In Naı̈ve Bayes model, we use random variable Xi 2 {0, 1} to represent i-th dimension of the
feature vector x, and random variable Y 2 {0, 1} to represent the class label y. Thus, we
can estimate probabilities P (Y ), P (Xi |Y ) and P (Xi , Y ) by counting data points in dataset
S, for example:

#{data points with y = 1} 4


P (Y = 1) = = = 0.5
#{all data points} 8
#{data points with x1 = 1 and y = 0} 2
P (X1 = 1|Y = 0) = = = 0.5
#{data points with y = 0} 4
P (X1 = 1, Y = 1) = P (X1 = 1|Y = 1)P (Y = 1)
#{data points with x1 = 1 and y = 1} 1
= = = 0.125
#{all data points} 8

1
It is noteworthy that only probabilities P (Y ), P (Xi |Y ) and P (Xi , Y ) can be directly esti-
mated from dataset S in Naı̈ve Bayes model. Other joint probabilities (e.g. P (X1 , X2 ) and
P (X1 , X2 , X3 )) should not be estimated by directly counting the data points.

Next, we can use the probabilities P (Y ) and P (Xi |Y ) to build our Naı̈ve Bayes model for
classification: For a feature vector x = (x1 , x2 , x3 ), we can estimate the probability P (Y =
y|X1 = x1 , X2 = x2 , X3 = x3 ) with the conditional independence assumptions:

P (X1 = x1 , X2 = x2 , X3 = x3 , Y = y)
P (Y = y|X1 = x1 , X2 = x2 , X3 = x3 ) =
P (X1 = x1 , X2 = x2 , X3 = x3 )
P (X1 = x1 , X2 = x2 , X3 = x3 |Y = y)P (Y = y)
=
P (X1 = x1 , X2 = x2 , X3 = x3 )
⇣Q ⌘
3
i=1 P (X i = x i |Y = y) P (Y = y)
=
P (X1 = x1 , X2 = x2 , X3 = x3 )

where the joint probability P (X1 = x1 , X2 = x2 , X3 = x3 ) can be calculated as:


1
X
P (X1 = x1 , X2 = x2 , X3 = x3 ) = P (X1 = x1 , X2 = x2 , X3 = x3 , Y = y)
y=0
1 ⇣
X ⌘
= P (X1 = x1 , X2 = x2 , X3 = x3 |Y = y)P (Y = y)
y=0
!
1
X ⇣Y
3 ⌘
= P (Xi = xi |Y = y) P (Y = y)
y=0 i=1

Finally, if we find:

P (Y = 1|X1 = x1 , X2 = x2 , X3 = x3 ) > P (Y = 0|X1 = x1 , X2 = x2 , X3 = x3 )

then we can predict the class of feature vector x = (x1 , x2 , x3 ) to be 1, otherwise 0. It is


noteworthy that although conditional independence assumptions are made in Naı̈ve Bayes
model, P (Y = 1|X1 = x1 , X2 = x2 , X3 = x3 ) + P (Y = 0|X1 = x1 , X2 = x2 , X3 = x3 ) should
still be 1.

1. (15 pts) Please estimate the following probabilities:

(1) P (X1 = 1, Y = 0), (2) P (Y = 0), (3) P (X1 = 1|Y = 1).

Note that these probabilities can be directly estimated by counting from dataset S.

2
2. (18 pts) Please calculate the probability P (Y = 1|X1 = 1, X2 = 1, X3 = 0) in Naı̈ve
Bayes model using conditional independence assumptions.

3. (7 pts) Please calculate the probability P (Y = 0|X1 = 1, X2 = 1, X3 = 0) in Naı̈ve


Bayes model and predict the class of feature vector x = (1, 1, 0).

3
2 (40 points) Decision Tree
In this question, we would like to create a decision tree model for a binary classification task.
Assume there is a classification dataset T = {(x(i) , y (i) ), i = 1, ..., 5} where each data point
(x, y) contains a feature vector x = (x1 , x2 ) 2 R2 and a ground-truth label y 2 {0, 1}. The
dataset T can be read from the table below:
i x1 x2 y
1 1.0 2.0 1
2 2.0 2.0 1
3 3.0 2.0 0
4 2.0 3.0 0
5 1.0 3.0 0
To build the decision tree model, we use a simplified CART algorithm, which is a recursive
procedure as follows:
• Initialize a root node with dataset T and set it as current node.
• Start a procedure for current node:
– Step 1: Assume the dataset in current node is Tcur . Check if all data points in
Tcur are in the same class:
∗ If it is true, set current node as a leaf node to predict the common class in
Tcur , and then terminate current procedure.
∗ If it is false, continue the procedure.
– Step 2: Traverse all possible splitting rules. Each splitting rule is represented by
a vector (j, t), which compares feature xj and threshold t to split the dataset Tcur
into two subsets T1 , T2 :
T1 = {(x, y) 2 Tcur where xj  t},
T2 = {(x, y) 2 Tcur where xj > t}.
We will traverse the rules over all feature dimensions j 2 {0, 1} and thresholds
t 2 {xj |(x, y) 2 Tcur }.
– Step 3: Decide the best splitting rule. The best splitting rule (j ⇤ , t⇤ ) minimizes
the weighted sum of Gini indices of T1 , T2 :
|T1 |Gini(T1 ) + |T2 |Gini(T2 )
(j ⇤ , t⇤ ) = arg min
j,t |T1 | + |T2 |
where the Gini(·) is defined as:
1
X
Gini(Ti ) = 1 P (Y = y)2 ,
y=0

#{data points with label y in Ti }


P (Y = y) = .
#{data points in Ti }
– Step 4: We split the dataset Tcur into two subsets T1⇤ , T2⇤ following the best
splitting rule (j ⇤ , t⇤ ). Then we set current node as a branch node and create child
nodes with the subsets T1⇤ , T2⇤ respectively. For each child node, start from Step 1
again recursively.

4
If we run the above decision tree building procedure on dataset T and find the generated tree
is shown below:

T
x2  2.0 x2 > 2.0

T1⇤ T2⇤
x1  2.0 x1 > 2.0

⇤ ⇤
T11 T12

Please answer the questions:

1. (16 pts) Calculate the subsets T1⇤ , T2⇤ , T11


⇤ ⇤
, T12 using the given decision tree.

2. (12 pts) Calculate Gini(T1⇤ ) and Gini(T2⇤ ).

5
3. (12 pts) With the given tree, we can predict the class of a feature vector x = (x1 , x2 ):

• Start from the root node of the tree:


– Step 1: If current node is a branch node, we evaluate conditions on branch
edges with x, choose the satisfied branch to go through, and repeat Step 1.
– Step 2: If current node is a leaf node, the common class of the subset in the
leaf node will be used as prediction.

Please predict the following feature vectors using the given tree:
(1) x = (2, 1),
(2) x = (3, 1),
(3) x = (3, 3).

4. (Bonus Question, 10 pts extra) In this question, you need to implement the decision
tree algorithm. Please download the Jupyter notebook HW4 Decision Tree.ipynb and
fill in the blanks. Note that since the same dataset T is used in the notebook, you can
use the code to check if your previous answers are correct or not. Please attach your
code and results in Gradescope submission.

6
3 (20 points) Bagging and Boosting
Assume we obtain T linear classifiers {ht , t = 1, ..., T } where each classifier h : R2 ! {+1, 1}
predicts the class ŷ 2 {+1, 1} with given feature vector x = (x1 , x2 ) as follows:
(
+1 if a 0,
ŷ = h(x) = sign(w1 x1 + w2 x2 + b) where sign(a) =
1 if a < 0.

where w1 , w2 , b 2 R are the parameters.

• In a bagging model Hbagging of the T linear classifiers, we calculate the average prediction
using classifiers {ht }, and then use it to predict the class ŷbagging :

⇣1 X
T ⌘
ŷbagging = Hbagging (x) = sign ht (x)
T t=1

• In a boosting model Hboosting of the T linear classifiers, we calculate the weighted sum
of predictions using classifiers {ht }, and then use it to predict the class ŷboosting :

⇣X
T ⌘
ŷboosting = Hboosting (x) = sign ↵t ht (x)
t=1

where {↵t , t = 1, ..., T } are the weight coefficients.

In this problem, suppose we have 3 linear classifiers (i.e. T = 3):

h1 (x) = sign(x1 + x2 + 1), h2 (x) = sign(x1 x2 ), h3 (x) = sign(x1 2x2 + 1).

Please answer the questions below:

1. (10 pts) Please calculate the ŷbagging of feature vector x = (1, 2) using bagging on these
three classifiers.

7
2. (10 pts) Please calculate the ŷboosting of feature vector x = (1, 2) using boosting on these
three classifiers. The weight coefficients are ↵1 = 0.8, ↵2 = 0.2, ↵3 = 0.3.

8
5/25/23, 4:06 PM HW4_Decision_Tree

Part I. Implement a decision tree algorithm and make


predictions.
In [16]: import numpy as np

In [30]: class TreeNode:


""" Node class in the decision tree. """
def __init__(self, T):
self.type = 'leaf' # Type of current node. Could be 'leaf' or 'branch' (
self.left = None # Left branch of the tree (for leaf node, it is None)
self.right = None # Right branch of the tree (for leaf node, it is None
self.dataset = T # Dataset of current node, which is a tuple (X, Y).
# X is the feature array and Y is the label vector.

def set_as_leaf(self, common_class):


""" Set current node as leaf node. """
self.type = 'leaf'
self.left = None
self.right = None
self.common_class = common_class

def set_as_branch(self, left_node, right_node, split_rule):


""" Set current node as branch node. """
self.type = 'branch'
self.left = left_node
self.right = right_node
# split_rule should be a tuple (j, t).
# When x_j <= t, it goes to left branch.
# When x_j > t, it goes to right branch.
self.split_rule = split_rule

In [31]: # Prepare for dataset.


def get_dataset():
X = np.array(
[[1.0, 2.0],
[2.0, 2.0],
[3.0, 2.0],
[2.0, 3.0],
[1.0, 3.0]
])
Y = np.array(
[1,
1,
0,
0,
0])
T = (X, Y) # The dataset T is a tuple of feature array X and label vector Y.
return T

T = get_dataset()

In this part, you are required to implement the decision tree algorithm shown in the
problem description of Q2 in HW4:

localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 1/6
5/25/23, 4:06 PM HW4_Decision_Tree

The 4 steps are marked in comments of the following code. Please fill in the missing
blanks (e.g. "...") in the TODOs:

In [32]: # Initialization.
root_node = TreeNode(T)

In [88]: # Procedure for current node.


def build_decision_tree_procedure(node_cur, depth=0):
# Step 1. Check if all data points in T_cur are in the same class
# - If it is true, set current node as a *leaf node* to predict the
# and then terminate current procedure.
# - If it is false, continue the procedure.

T_cur = node_cur.dataset
X_cur, Y_cur = T_cur # Get current feature array X_cur and label vector Y_c
if (Y_cur == 1).all():
print(' ' * depth + '+-> leaf node (predict 1).')
print(' ' * depth + ' Gini: {:.3f}'.format(Gini(T_cur)))
print(' ' * depth + ' samples: {}'.format(len(X_cur)))
node_cur.set_as_leaf(1)
return
elif (Y_cur == 0).all():
print(' ' * depth + '+-> leaf node (predict 0).')
print(' ' * depth + ' Gini: {:.3f}'.format(Gini(T_cur)))
print(' ' * depth + ' samples: {}'.format(len(X_cur)))
node_cur.set_as_leaf(0)
return

# Step 2. Traverse all possible splitting rules.


# - We will traverse the rules over all feature dimensions j in {0,
# thresholds t in X_cur[:, j] (i.e. all x_j in current feature arr
all_rules = []

#### TODO 1 STARTS ###


# Please traverse the rules over all feature dimensions j in {0, 1} and
# thresholds t in X_cur[:, j] (i.e. all x_j in current feature array X_cur
# and save all rules in all_rules variable.
# The all_rules variable should be a list of tuples such as [(0, 1.0), (0, 2

for j in range(2):
for t in range(len(X_cur[:, j])):
all_rules.append((j, X_cur[t, j]))
all_rules_set = set(all_rules)
all_rules = list(all_rules_set)
all_rules.sort()
#### TODO 1 ENDS ###

#print('All rules:', all_rules) # Code for debugging.

# Step 3. Decide the best splitting rule.


best_rule = (_, _)
best_weighted_sum = 1.0
for (j, t) in all_rules:

#### TODO 2 STARTS ###


# For each splitting rule (j, t), we use it to split the dataset T_cur i
# Hint: You may refer to Step 4 to understand how to set inds1, X1, Y1,

# - Create subset T1.

localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 2/6
5/25/23, 4:06 PM HW4_Decision_Tree

inds1 = X_cur[:,j] <= t # Indices vector for those data po


X1 = X_cur[inds1] # Feature array with inds1 in X_cur
Y1 = Y_cur[inds1] # Label vector with inds1 in Y_cur.
T1 = (X1, Y1) # Subset T1 contains feature array and label ve
len_T1 = len(T1[0]) # Size of subset T1.
# - Create subset T2.
inds2 = X_cur[:,j] > t # Indices vector for those data points with
X2 = X_cur[inds2] # Feature array with inds2 in X_cur
Y2 = Y_cur[inds2] # Label vector with inds2 in Y_cur.
T2 = (X2, Y2) # Subset T2 contains feature array and label
len_T2 = len(T2[0]) # Size of subset T2.
#### TODO 2 ENDS ###

# Calculate weighted sum and try to find the best one.


weighted_sum = (len_T1*Gini(T1) + len_T2*Gini(T2)) / (len_T1 + len_T2)

# print('Rule:', (j, t), 'len_T1, len_T2:', len(T1), len(T2), 'weighted_

if weighted_sum < best_weighted_sum:

#### TODO 3 STARTS ####


# Update the best rule and best weighted sum with current ones.

best_rule = (j, t)
best_weighted_sum = weighted_sum
#### TODO 3 ENDS ####

# Step 4. - We split the dataset T_cur into two subsets best_T1, best_T2 fol
# the best splitting rule (best_j, best_t).
# - Then we set current node as a *branch* node and create child nod
# the subsets best_T1, best_T2 respectively.
# - For each child node, start from *Step 1* again recursively.

best_j, best_t = best_rule


# - Create subset best_T1 and corresponding child node.
best_inds1 = X_cur[:,best_j] <= best_t
best_X1 = X_cur[best_inds1]
best_Y1 = Y_cur[best_inds1]
best_T1 = (best_X1, best_Y1)
node1 = TreeNode(best_T1)
# - Create subset best_T2 and corresponding child node.
best_inds2 = X_cur[:,best_j] > best_t
best_X2 = X_cur[best_inds2]
best_Y2 = Y_cur[best_inds2]
best_T2 = (best_X2, best_Y2)
node2 = TreeNode(best_T2)
# - Set current node as branch node and create child nodes.
node_cur.set_as_branch(left_node=node1, right_node=node2, split_rule=best_ru
print(' ' * depth + '+-> branch node')
print(' ' * depth + ' Gini: {:.3f}'.format(Gini(T_cur)))
print(' ' * depth + ' samples: {}'.format(len(X_cur)))
# - For each child node, start from Step 1 again recursively.
print(' ' * (depth + 1) + '|-> left branch: x_{} <= {} (with {} data poin
build_decision_tree_procedure(node1, depth+1) # Note: The depth is only used
print(' ' * (depth + 1) + '|-> right branch: x_{} > {} (with {} data poin
build_decision_tree_procedure(node2, depth+1)

def Gini(Ti):
""" Calculate the Gini index given dataset Ti. """
Xi, Yi = Ti # Get the feature array Xi and label vector Yi.

localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 3/6
5/25/23, 4:06 PM HW4_Decision_Tree

if len(Yi) == 0: # If the dataset Ti is empty, it simply returns 0.


return 0

num = 0
for i in range(len(Yi)):
if Yi[i] == 1 :
num += 1

#### TODO 4 STARTS ####


# Implement the Gini index function.

P_Y1 = num / len(Yi) # Estimate probability P(Y=1) in Yi


P_Y0 = (len(Yi) - num) / len(Yi) # Estimate probability P(Y=0) in Yi
Gini_Ti = 1 - P_Y1**2 - P_Y0**2 # Calculate Gini index: Gini_Ti = 1 - P(Y=1
#### TODO 4 ENDS ####

return Gini_Ti

After you finish the above code blank filling, you can use the following code to build the
decision tree. The following code also shows the structure of the tree.

In [90]: # Build the decision tree.


build_decision_tree_procedure(root_node)

# If your code is correct, you should output:


#
# +-> branch node
# Gini: 0.480
# samples: 5
# |-> left branch: x_1 <= 2.0 (with 3 data point(s)).
# +-> branch node
# Gini: 0.444
# samples: 3
# .....
#
# You can also use the sklearn results to validate your decision tree
# (the threshold could be slightly different but the structure of the tree shoul

+-> branch node


Gini: 0.480
samples: 5
|-> left branch: x_1 <= 2.0 (with 3 data point(s)).
+-> branch node
Gini: 0.444
samples: 3
|-> left branch: x_0 <= 2.0 (with 2 data point(s)).
+-> leaf node (predict 1).
Gini: 0.000
samples: 2
|-> right branch: x_0 > 2.0 (with 1 data point(s)).
+-> leaf node (predict 0).
Gini: 0.000
samples: 1
|-> right branch: x_1 > 2.0 (with 2 data point(s)).
+-> leaf node (predict 0).
Gini: 0.000
samples: 2

With the obtained decision tree, you can predict the class of new feature vectors:

localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 4/6
5/25/23, 4:06 PM HW4_Decision_Tree

In [91]: def decision_tree_predict(node_cur, x):


if node_cur.type == 'leaf':
return node_cur.common_class
else:
j, t = node_cur.split_rule
if x[j] <= t:
return decision_tree_predict(node_cur.left, x)
else:
return decision_tree_predict(node_cur.right, x)

In [92]: for x in [(2,1), (3,1), (3,3)]:


y_pred = decision_tree_predict(root_node, x)
print('Prediction of {} is {}'.format(x, y_pred))

Prediction of (2, 1) is 1
Prediction of (3, 1) is 0
Prediction of (3, 3) is 0

Part II. Use Scikit-learn to build the tree and make


predictions.
The following code uses Scikit-learn to build the decision tree. You can use it to check if
your previous implementation is correct or not.

In [93]: # Ref: https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-


from sklearn import tree
X, Y = T
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

The following code illustrates the obtained decision tree. It should have same structure
and similar rules compared with the tree in your own implementation.

In [94]: # Plotting the tree.


tree.plot_tree(clf)

Out[94]: [Text(0.6, 0.8333333333333334, 'x[1] <= 2.5\ngini = 0.48\nsamples = 5\nvalue =


[3, 2]'),
Text(0.4, 0.5, 'x[0] <= 2.5\ngini = 0.444\nsamples = 3\nvalue = [1, 2]'),
Text(0.2, 0.16666666666666666, 'gini = 0.0\nsamples = 2\nvalue = [0, 2]'),
Text(0.6, 0.16666666666666666, 'gini = 0.0\nsamples = 1\nvalue = [1, 0]'),
Text(0.8, 0.5, 'gini = 0.0\nsamples = 2\nvalue = [2, 0]')]

localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 5/6
5/25/23, 4:06 PM HW4_Decision_Tree

The following code makes the predictions using the obtained decision tree. It should
have identical results as the ones for your own implementaion.

In [95]: # Predict the class.


for x in [(2,1), (3,1), (3,3)]:
y_pred = clf.predict(np.array([x]))[0]
print('Prediction of {} is {}'.format(x, y_pred))

Prediction of (2, 1) is 1
Prediction of (3, 1) is 0
Prediction of (3, 3) is 0

localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 6/6

You might also like