0% found this document useful (0 votes)
3 views

Ex 6,EX 7 AIML

The document outlines the process of building decision trees and Support Vector Machine (SVM) models using the Iris dataset and synthetic data, respectively. It details the steps for creating a decision tree classifier, including data import, splitting, training with Gini index and entropy, and visualizing the tree. Additionally, it explains SVM concepts, demonstrates linear and non-linear classification, and includes code for implementing these models using Python libraries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Ex 6,EX 7 AIML

The document outlines the process of building decision trees and Support Vector Machine (SVM) models using the Iris dataset and synthetic data, respectively. It details the steps for creating a decision tree classifier, including data import, splitting, training with Gini index and entropy, and visualizing the tree. Additionally, it explains SVM concepts, demonstrates linear and non-linear classification, and includes code for implementing these models using Python libraries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Ex.6.

1 BUILD DECISION TREES


Date:
AIM
To build the decision tree using Iris dataset.

ALGORITHM

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.

Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM)

Step-3: Divide the S into subsets that contains possible values for the best attributes.

Step-4: Generate the decision tree node, which contains the best attribute.

Step-5: Recursively make new decision trees using the subsets of the dataset created in step
-3. Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node.

We will be using the IRIS dataset to build a decision tree classifier. The dataset contains
information for three classes of the IRIS plant, namely IRIS Setosa, IRIS Versi colour, and IRIS
Virginica, with the following attributes: sepal length, sepal width, petal length, and petal widt
PROGRAM:
# Importing the required packages
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt

Data Import and Exploration:

# Function to import the dataset


def importdata():
balance_data = pd.read_csv(
'https://archive.ics.uci.edu/ml/machine-learning-' +
'databases/balance-scale/balance-scale.data',
sep=',', header=None)

# Displaying dataset information


print("Dataset Length: ", len(balance_data))
print("Dataset Shape: ", balance_data.shape)
print("Dataset: ", balance_data.head())

return balance_data

Data Splitting:

# Function to split the dataset into features and target variables


def splitdataset(balance_data):

# Separating the target variable


X = balance_data.values[:, 1:5]
Y = balance_data.values[:, 0]

# Splitting the dataset into train and test


X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.3, random_state=100)

return X, Y, X_train, X_test, y_train, y_test

Training with Gini Index:

def train_using_gini(X_train, X_test, y_train):

# Creating the classifier object


clf_gini = DecisionTreeClassifier(criterion="gini",
random_state=100, max_depth=3, min_samples_leaf=5)

# Performing training
clf_gini.fit(X_train, y_train)
return clf_gini
Training with Entropy:

def train_using_entropy(X_train, X_test, y_train):

# Decision tree with entropy


clf_entropy = DecisionTreeClassifier(
criterion="entropy", random_state=100,
max_depth=3, min_samples_leaf=5)

# Performing training
clf_entropy.fit(X_train, y_train)
return clf_entropy

Prediction and Evaluation:

# Function to make predictions


def prediction(X_test, clf_object):
y_pred = clf_object.predict(X_test)
print("Predicted values:")
print(y_pred)
return y_pred

# Placeholder function for cal_accuracy


def cal_accuracy(y_test, y_pred):
print("Confusion Matrix: ",
confusion_matrix(y_test, y_pred))
print("Accuracy : ",
accuracy_score(y_test, y_pred)*100)
print("Report : ",
classification_report(y_test, y_pred))

Plots the Decision Tree:

# Function to plot the decision tree


def plot_decision_tree(clf_object, feature_names, class_names):
plt.figure(figsize=(15, 10))
plot_tree(clf_object, filled=True, feature_names=feature_names, class_names=class_names,
rounded=True)
plt.show()

if __name__ == "__main__":
data = importdata()
X, Y, X_train, X_test, y_train, y_test = splitdataset(data)

clf_gini = train_using_gini(X_train, X_test, y_train)


clf_entropy = train_using_entropy(X_train, X_test, y_train)

# Visualizing the Decision Trees


plot_decision_tree(clf_gini, ['X1', 'X2', 'X3', 'X4'], ['L', 'B', 'R'])
plot_decision_tree(clf_entropy, ['X1', 'X2', 'X3', 'X4'], ['L', 'B', 'R'])

OUTPUT:

DATA INFO
Dataset Length: 625
Dataset Shape: (625, 5)
Dataset: 0 1 2 3 4
0 B 1 1 1 1
1 R 1 1 1 2
2 R 1 1 1 3
3 R 1 1 1 4
4 R 1 1 1 5
RESULT:

Thus the decision tree for Iris dataset is executed and verified.
Ex.No.7 BUILD A SVM MODEL
Date:

AIM
To build a SVM model using MATLAB plot.

SUPPORT VECTOR MACHINE (SVM)

Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or
nonlinear classification, regression, and even outlier detection tasks.

Support Vector Machine Terminology

1. Hyperplane: Hyper plane is the decision boundary that is used to separate the data points
of different classes in a feature space. In the case of linear classifications, it will be a linear
equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the hyper plane, which
makes a critical role in deciding the hyper plane and margin.
3. Margin: Margin is the distance between the support vector and hyper plane. The main
objective of the support vector machine algorithm is to maximize the margin. The wider
margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original
input data points into high-dimensional feature spaces, so, that the hyper plane can be easily
found out even if the data points are not linearly separable in the original input space. Some
of the common kernel functions are linear, polynomial, radial basis function (RBF), and
sigmoid.

Let's import the packages:

import numpy as np
import pandas as pd
import sklearn
import sklearn.datasets as ds
import sklearn.model_selection as ms
import sklearn.svm as svm
import matplotlib.pyplot as plt
%matplotlib inline

We generate 2D points and assign a binary label according to a linear operation on the
coordinates:

X = np.random.randn(200, 2)
y = X[:, 0] + X[:, 1] > 1

We now fit a linear Support Vector Classifier (SVC). This classifier tries to separate the two
groups of points with a linear boundary (a line here, but more generally a hyperplane):

# We train the classifier.


est = svm.LinearSVC()
est.fit(X, y)
We define a function that displays the boundaries and decision function of a trained classifier:

# We generate a grid in the square [-3,3 ]^2.


xx, yy = np.meshgrid(np.linspace(-3, 3, 500),
np.linspace(-3, 3, 500))

# This function takes a SVM estimator as input.

def plot_decision_function(est, title):


# We evaluate the decision function on the grid.
Z = est.decision_function(np.c_[xx.ravel(),
yy.ravel()])
Z = Z.reshape(xx.shape)
cmap = plt.cm.Blues

# We display the decision function on the grid.


fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.imshow(Z,
extent=(xx.min(), xx.max(),
yy.min(), yy.max()),
aspect='auto',
origin='lower',
cmap=cmap)

# We display the boundaries.


ax.contour(xx, yy, Z, levels=[0],
linewidths=2,
colors='k')

# We display the points with their true labels.


ax.scatter(X[:, 0], X[:, 1],
s=50, c=.5 + .5 * y,
edgecolors='k',
lw=1, cmap=cmap,
vmin=0, vmax=1)
ax.axhline(0, color='k', ls='--')
ax.axvline(0, color='k', ls='--')
ax.axis([-3, 3, -3, 3])
ax.set_axis_off()
ax.set_title(title)

Let's take a look at the classification results with the linear SVC:

ax = plot_decision_function(
est, "Linearly separable, linear SVC")
We now modify the labels with a XOR function. A point's label is 1 if the coordinates have
different signs. This classification is not linearly separable. Therefore, a linear SVC fails
completely:

y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)

# We train the classifier.


est = ms.GridSearchCV(svm.LinearSVC(),
{'C': np.logspace(-3., 3., 10)})
est.fit(X, y)
print("Score: {0:.1f}".format(
ms.cross_val_score(est, X, y).mean()))

# We plot the decision function.


ax = plot_decision_function(
est, "XOR, linear SVC")

The SVC classifier in scikit-learn uses the Radial Basis Function (RBF) kernel:

y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)

est = ms.GridSearchCV(
svm.SVC(), {'C': np.logspace(-3., 3., 10),
'gamma': np.logspace(-3., 3., 10)})
est.fit(X, y)
print("Score: {0:.3f}".format(
ms.cross_val_score(est, X, y).mean()))

plot_decision_function(
est.best_estimator_, "XOR, non-linear SVC")

RESULT:
Thus SVM is built using MATLAB plot.

You might also like