Merging Result-Merged
Merging Result-Merged
Instructions: Please answer the questions below, attach your code in the document, and
insert figures to create a single PDF file. You may search information online but you will
need to write code/find solutions to answer the questions yourself.
i x1 x2 x3 y
1 0 0 1 1
2 0 1 1 1
3 1 1 0 1
4 0 0 1 1
5 0 1 0 0
6 1 1 0 0
7 1 0 0 0
8 0 0 1 0
In Naı̈ve Bayes model, we use random variable Xi 2 {0, 1} to represent i-th dimension of the
feature vector x, and random variable Y 2 {0, 1} to represent the class label y. Thus, we
can estimate probabilities P (Y ), P (Xi |Y ) and P (Xi , Y ) by counting data points in dataset
S, for example:
1
It is noteworthy that only probabilities P (Y ), P (Xi |Y ) and P (Xi , Y ) can be directly esti-
mated from dataset S in Naı̈ve Bayes model. Other joint probabilities (e.g. P (X1 , X2 ) and
P (X1 , X2 , X3 )) should not be estimated by directly counting the data points.
Next, we can use the probabilities P (Y ) and P (Xi |Y ) to build our Naı̈ve Bayes model for
classification: For a feature vector x = (x1 , x2 , x3 ), we can estimate the probability P (Y =
y|X1 = x1 , X2 = x2 , X3 = x3 ) with the conditional independence assumptions:
P (X1 = x1 , X2 = x2 , X3 = x3 , Y = y)
P (Y = y|X1 = x1 , X2 = x2 , X3 = x3 ) =
P (X1 = x1 , X2 = x2 , X3 = x3 )
P (X1 = x1 , X2 = x2 , X3 = x3 |Y = y)P (Y = y)
=
P (X1 = x1 , X2 = x2 , X3 = x3 )
⇣Q ⌘
3
i=1 P (X i = x i |Y = y) P (Y = y)
=
P (X1 = x1 , X2 = x2 , X3 = x3 )
Finally, if we find:
Note that these probabilities can be directly estimated by counting from dataset S.
2
2. (18 pts) Please calculate the probability P (Y = 1|X1 = 1, X2 = 1, X3 = 0) in Naı̈ve
Bayes model using conditional independence assumptions.
3
2 (40 points) Decision Tree
In this question, we would like to create a decision tree model for a binary classification task.
Assume there is a classification dataset T = {(x(i) , y (i) ), i = 1, ..., 5} where each data point
(x, y) contains a feature vector x = (x1 , x2 ) 2 R2 and a ground-truth label y 2 {0, 1}. The
dataset T can be read from the table below:
i x1 x2 y
1 1.0 2.0 1
2 2.0 2.0 1
3 3.0 2.0 0
4 2.0 3.0 0
5 1.0 3.0 0
To build the decision tree model, we use a simplified CART algorithm, which is a recursive
procedure as follows:
• Initialize a root node with dataset T and set it as current node.
• Start a procedure for current node:
– Step 1: Assume the dataset in current node is Tcur . Check if all data points in
Tcur are in the same class:
∗ If it is true, set current node as a leaf node to predict the common class in
Tcur , and then terminate current procedure.
∗ If it is false, continue the procedure.
– Step 2: Traverse all possible splitting rules. Each splitting rule is represented by
a vector (j, t), which compares feature xj and threshold t to split the dataset Tcur
into two subsets T1 , T2 :
T1 = {(x, y) 2 Tcur where xj t},
T2 = {(x, y) 2 Tcur where xj > t}.
We will traverse the rules over all feature dimensions j 2 {0, 1} and thresholds
t 2 {xj |(x, y) 2 Tcur }.
– Step 3: Decide the best splitting rule. The best splitting rule (j ⇤ , t⇤ ) minimizes
the weighted sum of Gini indices of T1 , T2 :
|T1 |Gini(T1 ) + |T2 |Gini(T2 )
(j ⇤ , t⇤ ) = arg min
j,t |T1 | + |T2 |
where the Gini(·) is defined as:
1
X
Gini(Ti ) = 1 P (Y = y)2 ,
y=0
4
If we run the above decision tree building procedure on dataset T and find the generated tree
is shown below:
T
x2 2.0 x2 > 2.0
T1⇤ T2⇤
x1 2.0 x1 > 2.0
⇤ ⇤
T11 T12
5
3. (12 pts) With the given tree, we can predict the class of a feature vector x = (x1 , x2 ):
Please predict the following feature vectors using the given tree:
(1) x = (2, 1),
(2) x = (3, 1),
(3) x = (3, 3).
4. (Bonus Question, 10 pts extra) In this question, you need to implement the decision
tree algorithm. Please download the Jupyter notebook HW4 Decision Tree.ipynb and
fill in the blanks. Note that since the same dataset T is used in the notebook, you can
use the code to check if your previous answers are correct or not. Please attach your
code and results in Gradescope submission.
6
3 (20 points) Bagging and Boosting
Assume we obtain T linear classifiers {ht , t = 1, ..., T } where each classifier h : R2 ! {+1, 1}
predicts the class ŷ 2 {+1, 1} with given feature vector x = (x1 , x2 ) as follows:
(
+1 if a 0,
ŷ = h(x) = sign(w1 x1 + w2 x2 + b) where sign(a) =
1 if a < 0.
• In a bagging model Hbagging of the T linear classifiers, we calculate the average prediction
using classifiers {ht }, and then use it to predict the class ŷbagging :
⇣1 X
T ⌘
ŷbagging = Hbagging (x) = sign ht (x)
T t=1
• In a boosting model Hboosting of the T linear classifiers, we calculate the weighted sum
of predictions using classifiers {ht }, and then use it to predict the class ŷboosting :
⇣X
T ⌘
ŷboosting = Hboosting (x) = sign ↵t ht (x)
t=1
1. (10 pts) Please calculate the ŷbagging of feature vector x = (1, 2) using bagging on these
three classifiers.
7
2. (10 pts) Please calculate the ŷboosting of feature vector x = (1, 2) using boosting on these
three classifiers. The weight coefficients are ↵1 = 0.8, ↵2 = 0.2, ↵3 = 0.3.
8
5/25/23, 4:06 PM HW4_Decision_Tree
T = get_dataset()
In this part, you are required to implement the decision tree algorithm shown in the
problem description of Q2 in HW4:
localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 1/6
5/25/23, 4:06 PM HW4_Decision_Tree
The 4 steps are marked in comments of the following code. Please fill in the missing
blanks (e.g. "...") in the TODOs:
In [32]: # Initialization.
root_node = TreeNode(T)
T_cur = node_cur.dataset
X_cur, Y_cur = T_cur # Get current feature array X_cur and label vector Y_c
if (Y_cur == 1).all():
print(' ' * depth + '+-> leaf node (predict 1).')
print(' ' * depth + ' Gini: {:.3f}'.format(Gini(T_cur)))
print(' ' * depth + ' samples: {}'.format(len(X_cur)))
node_cur.set_as_leaf(1)
return
elif (Y_cur == 0).all():
print(' ' * depth + '+-> leaf node (predict 0).')
print(' ' * depth + ' Gini: {:.3f}'.format(Gini(T_cur)))
print(' ' * depth + ' samples: {}'.format(len(X_cur)))
node_cur.set_as_leaf(0)
return
for j in range(2):
for t in range(len(X_cur[:, j])):
all_rules.append((j, X_cur[t, j]))
all_rules_set = set(all_rules)
all_rules = list(all_rules_set)
all_rules.sort()
#### TODO 1 ENDS ###
localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 2/6
5/25/23, 4:06 PM HW4_Decision_Tree
best_rule = (j, t)
best_weighted_sum = weighted_sum
#### TODO 3 ENDS ####
# Step 4. - We split the dataset T_cur into two subsets best_T1, best_T2 fol
# the best splitting rule (best_j, best_t).
# - Then we set current node as a *branch* node and create child nod
# the subsets best_T1, best_T2 respectively.
# - For each child node, start from *Step 1* again recursively.
def Gini(Ti):
""" Calculate the Gini index given dataset Ti. """
Xi, Yi = Ti # Get the feature array Xi and label vector Yi.
localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 3/6
5/25/23, 4:06 PM HW4_Decision_Tree
num = 0
for i in range(len(Yi)):
if Yi[i] == 1 :
num += 1
return Gini_Ti
After you finish the above code blank filling, you can use the following code to build the
decision tree. The following code also shows the structure of the tree.
With the obtained decision tree, you can predict the class of new feature vectors:
localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 4/6
5/25/23, 4:06 PM HW4_Decision_Tree
Prediction of (2, 1) is 1
Prediction of (3, 1) is 0
Prediction of (3, 3) is 0
The following code illustrates the obtained decision tree. It should have same structure
and similar rules compared with the tree in your own implementation.
localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 5/6
5/25/23, 4:06 PM HW4_Decision_Tree
The following code makes the predictions using the obtained decision tree. It should
have identical results as the ones for your own implementaion.
Prediction of (2, 1) is 1
Prediction of (3, 1) is 0
Prediction of (3, 3) is 0
localhost:8888/nbconvert/html/HW4_Decision_Tree.ipynb?download=false 6/6