ML Practical
ML Practical
Practical - 1
Aim: Write a program to demonstrate the working of the decision tree based ID3
algorithm.
Theory:
❖ ID3 Algorithm:
The ID3 algorithm uses entropy and information gain as measures to make decisions about feature
selection and node splitting. It aims to create a decision tree that maximizes the information gain at
each step, leading to a tree that can accurately classify new examples.
❖ Working:
1. Input: The algorithm takes as input a dataset with labelled examples. Each example consists
of a set of features and a corresponding class label.
2. Select the root node: The first step is to select the root node of the decision tree. This is
typically done by choosing the feature that provides the most information gain.
3. Calculate information gain: For each feature, the algorithm calculates the information gain.
Information gain measures how much the entropy of the dataset is reduced by splitting it
based on a particular feature. The feature with the highest information gain is chosen as the
root node.
4. Split the dataset: The dataset is split into subsets based on the selected feature at the root
node. Each subset contains examples that have the same value for the chosen feature.
5. Repeat the process: The algorithm recursively repeats the above steps for each subset
created in the previous step. It calculates the information gain for each remaining feature in
the subset and chooses the one with the highest information gain as the next node in the tree.
This process continues until a stopping criterion is reached.
6. Stopping criterion: The stopping criterion could be reaching a maximum depth for the tree,
having a minimum number of examples at a node, or when all examples in a subset belong to
the same class.
7. Assign class labels: Once the tree is built, class labels are assigned to the leaf nodes. This is
done by taking the majority class of the examples in each leaf node.
8. Predicting with the tree: To predict the class label for a new example, it traverses the
decision tree based on the values of the features until it reaches a leaf node. The class label
associated with that leaf node is then assigned as the predicted class label for the example.
❖ Procedure:
#Step-1: Import python libraries.import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
❖ Output:
Practical – 2
Aim: Build an Artificial Neural Network by implementing the Backpropagation
algorithm and test the same using appropriate data sets.
Theory:
❖ Artificial Neural Networks (ANNs):
● The term "Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain. Similar to the human brain that has neurons interconnected to one
another, artificial neural networks also have neurons that are interconnected to one another in
various layers of the networks. These neurons are known as nodes.
● Artificial Neural Networks are computational models inspired by the structure and functioning of
biological neural networks in the human brain. ANNs consist of interconnected nodes, called
neurons, organized into layers.
● The neurons in one layer are connected to neurons in the adjacent layers. The first layer is the
input layer, the last layer is the output layer, and the intermediate layers are called hidden layers.
ANNs are capable of learning from data and making predictions or decisions based on the
learned patterns.
❖ Figure:
❖ Backpropagation Algorithm:
- The backpropagation algorithm is a widely used method for training feedforward neural networks.
It allows the network to learn from labeled training data by adjusting the weights and biases of the
neurons based on the prediction errors.
The key steps in the backpropagation algorithm are as follows:
1. Initialization: Randomly initialize the weights and biases of the neural network.
2. Forward Propagation: Pass an input through the network, calculating the weighted sum of
inputs and applying an activation function to produce an output. This process is repeated for
each layer, propagating the input forward through the network until the output is obtained.
3. Error Calculation: Compute the difference between the predicted output and the actual output
(the error).
4. Backward Propagation: Propagate the error backward through the network, layer by layer.
For each layer, calculate the gradient of the error with respect to the weights and biases using
the chain rule of calculus. This determines how much each weight and bias contributed to the
overall error.
5. Weight and Bias Update: Adjust the weights and biases of the neurons in each layer using the
gradients computed in the previous step. The adjustment is performed by subtracting a
fraction of the gradients multiplied by a learning rate, which controls the step size during
optimization.
6. Repeat: Repeat steps 2 to 5 for a specified number of iterations or until the network's
performance reaches a satisfactory level.
- By iteratively updating the weights and biases using the backpropagation algorithm, the neural
network gradually improves its ability to make accurate predictions or classifications based on the
training data.
- The backpropagation algorithm is an efficient way to train neural networks, but it requires a large
amount of labelled training data and may suffer from issues such as overfitting or getting stuck in
local minima. Regularization techniques, learning rate schedules, and other optimization strategies
are often employed to address these challenges.
Procedure:
❖ Task-1: Import Libraries, preprocess the training dataset and selection of reuired features.
❖ Task-3: Create the confusion matrix using seaborn and matplotlib library.
Code:
#Step-1: Import seaborn library.
import seaborn as sns
import matplotlib.pyplot as plt
❖ Output:
Practical - 3
Aim: Write a program to implement the naïve Bayesian classifier for a sample
training data set stored as a .CSV file. Compute the accuracy of the classifier,
considering few test data sets.
Theory:
❖ What is Naïve Bayesian classifier?
- The Naive Bayes classifier is a simple yet powerful probabilistic machine learning algorithm
that is commonly used for classification tasks. It is based on Bayes' theorem and assumes that the
features are conditionally independent given the class label. This assumption is known as the
"naive" assumption, which simplifies the calculation of probabilities.
- In machine learning, Naïve Bayes classification is a straightforward and powerful algorithm for
the classification task. Naïve Bayes classification is based on applying Bayes’ theorem with
strong independence assumption between the features. Naïve Bayes classification produces good
results when we use it for textual data analysis such as Natural Language Processing.
- Naïve Bayes models are also known as simple Bayes or independent Bayes. All these names
refer to the application of Bayes’ theorem in the classifier’s decision rule. Naïve Bayes classifier
applies the Bayes’ theorem in practice. This classifier brings the power of Bayes’ theorem to
machine learning.
- The Naive Bayes classifier is built on the foundation of Bayes' theorem, which relates
conditional probabilities.
❖ Given a feature vector X and a class label y, Bayes' theorem states:
Procedure:
❖ Task – 1: To pre-process the chosen dataset.
Code:
#Step – 1: Import standard libraries.
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
#Step – 5: Select the relevant columns as features (x) and the target variable (y)
x = playgolf_data[['Outlook_Overcast', 'Outlook_Rainy', 'Outlook_Sunny',
'Temperature']].values
y = playgolf_data['PlayGolf'].values
❖ Task – 2: To perform feature scaling and fitting Naïve Bayes to training dataset.
Code:
#Step – 1: Split the dataset into the training set and test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)
#Step – 2: Select the relevant columns as features (x) and the target variable (y).
x = playgolf_data[['Outlook', 'Temperature']].values
y = playgolf_data['PlayGolf'].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)
classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)
❖ Task – 4: To create confusion matrix for the Naïve Bayes classifier.
#Step - 1: Import the require libraries.
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
#Step – 3: Split the dataset into the training set and test set.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)
classifier = GaussianNB()
classifier.fit(x_train, y_train)
❖ Output:
Practical - 4
Aim: Assuming a set of documents that need to be classified, use the naïve
Bayesian Classifier model to perform this task.
Theory:
❖ What is Naïve Bayes Classifier?
- The Naive Bayes classifier is a simple yet powerful probabilistic machine learning algorithm
that is commonly used for classification tasks. It is based on Bayes' theorem and assumes that the
features are conditionally independent given the class label. This assumption is known as the
"naive" assumption, which simplifies the calculation of probabilities.
- In machine learning, Naïve Bayes classification is a straightforward and powerful algorithm for
the classification task. Naïve Bayes classification is based on applying Bayes’ theorem with
strong independence assumption between the features. Naïve Bayes classification produces good
results when we use it for textual data analysis such as Natural Language Processing.
- Naïve Bayes models are also known as simple Bayes or independent Bayes. All these names
refer to the application of Bayes’ theorem in the classifier’s decision rule. Naïve Bayes classifier
applies Bayes’ theorem in practice. This classifier brings the power of Bayes’ theorem to
machine learning.
❖ Dataset taken: Document(Sample dataset)
- Document is a set of text documents along with their target values. V is the set of all possible target
values. This function learns the probability terms P(wk |vj,), describing the probability that a
randomly drawn word from a document in class vj will be the English word wk.
- No. of Rows: 18
- No. of Columns: 2
❖ Dataset taken:
Procedure:
❖ Task – 1: To import libraries and load dataset.
Code:
#Step – 1: Import standard libraries.
import pandas as pd
❖ Output:
Fig
ure – 4.2: Top 5 records of the dataset
❖ Task – 2: To import libraries and load dataset.
Code:
#Step – 1: Splitting the dataset into train and test data
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split( X, y, test_size=4, random_state=4)
print ('\n The total number of Training Data :',ytrain.shape)
print ('\n The total number of Test Data :',ytest.shape)
❖ Output:
❖ Output:
❖ Output:
❖ Output:
❖ Conclusion:
Successfully performed the Naïve Bayes classification of document using python libraries and
Google colab tool. Performance-wise the Naïve Bayes classifier has superior performance compared
to many other classifiers. The final output table is obtained after performing different dataset splits of
both training and testing dataset. Overall, the Accuracy increases if we take optimal test set data for
training.
Practical - 5
Aim: Write a program to construct a Bayesian network considering medical
data. Use this model to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.
Theory:
❖ What is Naïve Bayes Classifier?
- The Naive Bayes classifier is a simple yet powerful probabilistic machine learning algorithm
that is commonly used for classification tasks. It is based on Bayes' theorem and assumes that the
features are conditionally independent given the class label. This assumption is known as the
"naive" assumption, which simplifies the calculation of probabilities.
- In machine learning, Naïve Bayes classification is a straightforward and powerful algorithm for
the classification task. Naïve Bayes classification is based on applying Bayes’ theorem with
strong independence assumption between the features. Naïve Bayes classification produces good
results when we use it for textual data analysis such as Natural Language Processing.
- Naïve Bayes models are also known as simple Bayes or independent Bayes. All these names
refer to the application of Bayes’ theorem in the classifier’s decision rule. Naïve Bayes classifier
applies Bayes’ theorem in practice. This classifier brings the power of Bayes’ theorem to
machine learning.
- The Naive Bayes classifier is built on the foundation of Bayes' theorem, which relates
conditional probabilities. Naïve Bayes Classifier uses the Bayes’ theorem to predict membership
probabilities for each class such as the probability that given record or data point belongs to a
particular class. The class with the highest probability is considered as the most likely class.
❖ The Naive Bayes classifier works as follows:
1.) Training: Given a labelled training dataset, the classifier calculates the prior probability P(y)
for each class in the dataset. It also estimates the likelihood probability P(X|y) for each feature
given each class. This is done by assuming conditional independence between the features.
2.) Prediction: When a new unlabelled instance is presented, the classifier calculates the
posterior probability P(y|X) for each class using Bayes' theorem. It then assigns the class label
with the highest posterior probability as the predicted class for that instance.
3.) Handling Continuous Features: For continuous features, the Naive Bayes classifier
typically assumes a probability distribution, often Gaussian (hence called Gaussian Naive
Bayes), to estimate the likelihood probability.
4.) Laplace Smoothing: To avoid zero probabilities when a feature value in the testing data was
not observed in the training data, Laplace smoothing (also known as additive smoothing) is often
applied. It adds a small constant to numerator and adjusts the denominator accordingly.
5.) Decision Rule: In some cases, the Naive Bayes classifier can be used for decision making by
considering the posterior probabilities. For example, in binary classification, if P(y=1|X) >
P(y=0|X), the instance is assigned to class 1; otherwise, it is assigned to class 0.
❖ Dataset taken: Heartdisease
- The dataset consists of medical data with seven attributes: age, gender, family history, diet,
lifestyle, cholesterol level, and the presence or absence of heart disease. Each row represents an
individual's information, including their attributes and heart disease status, with binary values
indicating the presence (1) or absence (0) of heart disease.
- No. of Rows: 19
- No. of Columns: 7
Procedure:
#Step – 1: Install the pgmpy library using pip.
!pip install pgmpy
('cholestrol', 'heartdisease'),
('diet', 'cholestrol')
])
#Step – 5: Fit the model using MLE.
model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)
print(q)
❖ Output:
❖ Final Output:
❖ Observations:
The Bayesian network constructed in this exercise is designed to predict the presence or absence of
heart disease based on various medical attributes. When we applied the Bayesian network for heart
disease diagnosis using sample attributes, the model produced a probability distribution over the two
possible outcomes (presence or absence of heart disease). In this specific case, the model estimated
an equal probability of 0.5 for both scenarios, indicating a 50% chance of having heart disease and a
50% chance of not having it based on the provided attributes. This demonstrates the probabilistic
nature of the Bayesian network, where it provides uncertainty estimates for different outcomes based
on the available evidence.
❖ Conclusion:
In conclusion, this practical exercise demonstrated the construction of a Bayesian network for
medical data analysis and showcased its application in diagnosing heart disease using the standard
Heart Disease Data Set. The Naïve Bayes Classifier, which is based on Bayes' theorem and the
assumption of conditional independence among features, was utilized to create a probabilistic model.
The key steps included data loading, defining the Bayesian network structure, fitting the model using
Maximum Likelihood Estimator, and performing inference for heart disease diagnosis based on user-
provided attributes. By leveraging such models, healthcare professionals can make more informed
decisions and improve patient care.
Practical - 6
Aim: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the
same data set for clustering using k-Means algorithm.
Theory:
❖ What is EM Algorithm?
The EM algorithm was proposed and named in a seminal paper published in 1977 by Arthur
Dempster, Nan Laird, and Donald Rubin. Their work formalized the algorithm and demonstrated its
usefulness in statistical modeling and estimation.
The Expectation-Maximization (EM) algorithm is an iterative optimization method that combines
different unsupervised machine learning algorithms to find maximum likelihood or maximum
posterior estimates of parameters in statistical models that involve unobserved latent variables. The
EM algorithm is commonly used for latent variable models and can handle missing data. It consists
of an estimation step (E-step) and a maximization step (M-step), forming an iterative process to
improve model fit.
● In the E step, the algorithm computes the latent variables i.e. expectation of the log-likelihood
using the current parameter estimates.
● In the M step, the algorithm determines the parameters that maximize the expected log-
likelihood obtained in the E step, and corresponding model parameters are updated based on
the estimated latent variables.
By iteratively repeating these steps, the EM algorithm seeks to maximize the likelihood of the
observed data. It is commonly used for unsupervised learning tasks, such as clustering, where latent
variables are inferred and has applications in various fields, including ML, computer vision, and
NLP.
The essence of the Expectation-Maximization algorithm is to use the available observed data of the
dataset to estimate the missing data and then use that data to update the values of the parameters. Let
us understand the EM algorithm in detail.
❖ Dataset taken: IRIS Dataset.
- This data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and
sepal length, stored in a 150x4 numpy.ndarray. The rows being the samples and the columns being:
Sepal Length, Sepal Width, Petal Length and Petal Width.
- No. of Rows: 150
- No. of Columns: 4
Procedure:
#Step – 1: Import necessary libraries.
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns=X.columns)
#Step – 9: Build a GMM with 40 components and fit to the scaled data.
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=40)
gmm.fit(xs)
❖ Conclusion:
In this practical, we applied both K-Means clustering and Gaussian Mixture Models (GMM) with the
Expectation-Maximization (EM) algorithm to cluster the Iris dataset. Our observation revealed that
GMM-EM produced clustering results that closely matched the true labels, indicating its
effectiveness in capturing the dataset's underlying distribution. K-Means, while reasonable, struggled
to account for potential overlaps between clusters. This highlights the importance of selecting the
appropriate clustering method based on the dataset's characteristics and problem requirements.
Practical - 7
Aim: Write a program to implement k-nearest neighbour algorithm to classify
the iris dataset.
Theory:
❖ What is k-nearest neighbour algorithm?
- K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique. K-NN algorithm assumes the similarity between the new
case/data and available cases and put new case into the category that is most similar to
available categories.
- K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm. K-NN algorithm can be used for Regression as well as
for Classification but mostly it is used for the Classification problems.
- K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data. It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset. KNN algorithm at the training phase just stores the dataset
and when it gets new data, then it classifies that data into a category that is much similar to
the new data.
- Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of
problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the
category or class of a particular dataset.
Consider the below diagram:
Fig
ure – 7.1: K-Nearest Neighbour
❖ Procedure:
Task-1: Import necessary libraries and load dadataset.
#Step – 1: Import required python libraries.
import numpy as np
import pandas as pd
data = load_iris()
X = data.data
Y = data.target
classes = data.target_names
df = pd.DataFrame(X, columns=data.feature_names)
print(df.head())
plt.figure(figsize=(8, 6))
plt.hist(Y, rwidth=1)
plt.title('Label Distribution')
plt.xlabel('Labels (0 / 1 / 2)')
plt.ylabel('Count')
plt.show()
# Step – 2: Compute and plot the confusion matrix for the training set
confusion_train = confusion_matrix(y_train, y_pred_train)
# Step – 3: Compute and plot the confusion matrix for the test set
confusion_test = confusion_matrix(y_test, y_pred)
conf_display_test = ConfusionMatrixDisplay(confusion_test, display_labels=classes)
conf_display_test.plot(cmap=plt.cm.Blues, colorbar=False)
plt.title('Test Set Confusion Matrix')
plt.show()
#Step – 2: Print the classification report in a tabular format with a border size of 1.
print(tabulate(df_classification_rep, headers='keys', tablefmt='fancy_grid'))
Final Output:
❖ Conclusion:
In conclusion, we successfully implemented the K-nearest neighbors (KNN) algorithm for
classification on the Iris dataset using Python libraries and Google Colab. The KNN algorithm
demonstrated good performance on the Iris dataset, showcasing its effectiveness as a simple and
intuitive classification method. Fine-tuning of parameters such as the value of k and the choice of
distance metric can further enhance the performance of KNN on different datasets.
Practical - 8
Aim: Implement linear regression and logistic regression.
Theory:
❖ What is linear and logistic regression?
Linear Regression and Logistic Regression are the two famous Machine Learning Algorithms which
come under supervised learning technique. Since both the algorithms are of supervised in nature
hence these algorithms use labeled dataset to make the predictions. But the main difference between
them is how they are being used. The Linear Regression is used for solving Regression problems
whereas Logistic Regression is used for solving the Classification problems.
Linear Regression:
❖ Linear Regression is one of the most simple Machine learning algorithm that comes under
Supervised Learning technique and used for solving regression problems. It is used for
predicting the continuous dependent variable with the help of independent variables. The goal
of the Linear regression is to find the best fit line that can accurately predict the output for the
continuous dependent variable.
Logistic Regression:
❖ Logistic regression is one of the most popular Machine learning algorithm that comes under
Supervised Learning techniques. It can be used for Classification as well as for Regression
problems, but mainly used for Classification problems. Logistic regression is used to predict
the categorical dependent variable with the help of independent variables. The output of
Logistic Regression problem can be only between the 0 and 1.
Consider the below diagram:
❖ Procedure:
Task – 1: Import necessary libraries and load dadataset.
#Step – 1: Import required python libraries.
import numpy as np
import pandas as pd
data = load_iris()
X = data.data
Y = data.target
classes = data.target_names
df = pd.DataFrame(X, columns=data.feature_names)
print(df.head())
plt.figure(figsize=(8, 6))
plt.hist(Y, rwidth=1)
plt.title('Label Distribution')
plt.xlabel('Labels (0 / 1 / 2)')
plt.ylabel('Count')
plt.show()
# Step – 2: Compute and plot the confusion matrix for logistic regression.
confusion_logistic = confusion_matrix(y_test, y_pred_logistic)
conf_display_logistic = ConfusionMatrixDisplay(confusion_logistic, display_labels=classes)
conf_display_logistic.plot(cmap=plt.cm.Blues, colorbar=False)
plt.title('Logistic Regression Confusion Matrix')
plt.show()
Final Output:
❖ Conclusion:
In conclusion, we successfully implemented linear regression and logistic regression algorithms on
the Iris dataset using Python libraries. Both models showed promising results in their respective
tasks. Linear regression is primarily used for predicting continuous numeric values rather than class
labels. Logistic regression, specifically designed for classification tasks, demonstrated good
performance on the Iris dataset. It achieved high accuracy, precision, recall, and F1 score values,
indicating its effectiveness in distinguishing between different classes of the Iris.
Practical - 9
Aim: Compare various supervised learning algorithms using appropriate dataset.
Theory:
Supervised learning algorithms are trained using labeled data, where the features (input variables) are
used to predict the target variable (class label). In the case of the Iris dataset, the target variable
represents the species of the iris flower. By training and testing the algorithms on the Iris dataset, we
can compare their performance in accurately classifying the iris flowers.
To compare the various supervised learning algorithms using the Iris dataset, we will analyze and
evaluate their performance in predicting the species of iris flowers based on their features. The Iris
dataset is a widely used dataset in machine learning and consists of measurements of four features
(sepal length, sepal width, petal length, and petal width) of 150 iris flowers from three different
species (Setosa, Versicolor, and Virginica).
❖ Step-by-Step Procedure:
1.) Load the Iris dataset: The Iris dataset can be loaded using libraries like NumPy and Pandas. It
consists of a 150x4 numpy.ndarray, with rows representing the samples and columns representing
the features.
2.) Choose appropriate algorithms: Select a variety of supervised learning algorithms suitable for
classification tasks. For the Iris dataset, commonly used algorithms are Logistic Regression, k-
Nearest Neighbors (k-NN), Support Vector Machines (SVM), Decision Trees, Random Forests,
Gradient Boosting (e.g., XGBoost or LightGBM), and Neural Networks (e.g., Multi-Layer
Perceptron).
3.) Split the dataset: Split the dataset into training and testing sets. Typically, a common split is
80% for training and 20% for testing. This can be done using the train_test_split function from the
scikit-learn library.
4.) Train and evaluate the algorithms: Train each algorithm on the training data and evaluate their
performance on the testing data. This involves fitting the model to the training data and making
predictions on the testing data.
5.) Compare the results: Compare the performance of each algorithm using appropriate evaluation
metrics such as accuracy, precision, recall, F1 score, and confusion matrix. These metrics provide
insights into the algorithm's ability to correctly classify the iris flowers.
Let's compare the various supervised learning algorithms on the Iris dataset:
1. Logistic Regression:
- Logistic regression is a simple and interpretable algorithm that models the probability of a
categorical outcome. It works well when the classes are linearly separable. In the case of the
Iris dataset, logistic regression can be effective because the classes are relatively well
separated, especially Setosa.
- Logistic regression can provide insights into the importance and influence of each feature on
the prediction. It's a good starting point for binary classification problems, but it can also
handle multi-class classification using techniques like one-vs-rest or softmax regression.
- Pros: It works well when the classes are linearly separable, provides insights into feature
importance, and is computationally efficient.
- Cons: It may not perform well on datasets with complex relationships and assumes a linear
relationship between the features and the classes.
2. k-Nearest Neighbors (k-NN):
- The k-NN algorithm classifies new instances based on their similarity to the training
instances. It is non-parametric and does not make assumptions about the underlying data
distribution.
- In the case of the Iris dataset, k-NN can capture the complex decision boundaries between the
classes, as the classes are well separated but not necessarily linearly separable.
- However, k-NN can be sensitive to the choice of the number of neighbors (k) and may suffer
from the curse of dimensionality if the feature space is large. It is also computationally
expensive when dealing with large datasets.
- Pros: It can capture complex decision boundaries, works well when the classes are not
linearly separable, and is easy to understand and implement.
- Cons: It is sensitive to the choice of k, computationally expensive with large datasets, and
suffers from the curse of dimensionality.
3. Support Vector Machines (SVM):
- SVM is a powerful algorithm that can handle linear and non-linear classification by finding
the best hyperplane or set of hyperplanes to separate the classes.
- In the case of the Iris dataset, SVM can effectively create non-linear decision boundaries by
using kernel functions such as radial basis function (RBF) kernel.
- SVMs are effective in high-dimensional spaces and can handle datasets with small sample
sizes.
- Pros: They can handle both linear and non-linear decision boundaries, are effective in high-
dimensional spaces, and provide good generalization performance.
- Cons: They can be computationally intensive for large datasets, require careful parameter
tuning, and may be sensitive to the choice of kernel.
4. Decision Trees:
- Decision trees are intuitive and interpretable models that recursively split the data based on
features to make predictions. They can handle both numerical and categorical features and
capture non-linear relationships.
- In the case of the Iris dataset, decision trees can learn decision rules based on the feature
values to classify the flowers accurately.
- However, decision trees are prone to overfitting, especially if the trees are deep and complex.
Techniques like pruning and setting a maximum depth can help alleviate this issue.
- Pros: They are easy to interpret, handle both numerical and categorical data, capture non-
linear relationships, and provide feature importance.
- Cons: They are prone to overfitting, especially with deep and complex trees, and can be
sensitive to small variations in the data.
5. Random Forests:
- Random forests are ensemble models that combine multiple decision trees to reduce
overfitting and improve generalization performance. Each tree is trained on a random subset
of the data, and the final prediction is made by aggregating the predictions of all the trees.
- Random forests can handle high-dimensional data well and are less prone to overfitting
compared to a single decision tree. They provide a measure of feature importance based on
the average impurity reduction across the trees.
- However, random forests can be computationally expensive due to training of multiple trees.
- Pros: They handle high-dimensional data well, provide feature importance, and are less prone
to overfitting compared to single decision trees.
- Cons: They are less interpretable than a single decision tree and can be computationally
expensive due to the training of multiple trees.
6. Gradient Boosting:
- Gradient boosting is another ensemble method that combines weak learners to create a strong
learner. It builds the model in a stage-wise manner, where each subsequent model corrects the
mistakes made by the previous models.
- Gradient boosting algorithms like XGBoost or LightGBM are known for their high predictive
power and the ability to capture complex interactions and non-linear relationships.
- Pros: It handles complex interactions and non-linear relationships, provides high predictive
power, and can capture subtle patterns in the data.
- Cons: It requires careful parameter tuning, can be computationally intensive, and may be
sensitive to overfitting.
7. Neural Networks:
- Neural networks, such as Multi-Layer Perceptrons (MLPs), consist of interconnected
layers of neurons that learn representations of the input data. Neural networks can handle
large amounts of data and adapt to different problem domains. They have shown
impressive performance on a wide range of tasks, including image classification and
NLP.
- Pros: They can learn complex patterns, handle large amounts of data, and adapt to
different problem domains.
- Cons: They require a large amount of data and computational resources, are prone to
overfitting, and require careful selection of network architecture and hyperparameters.
❖ Final Table:
Algorithm Training Set Accuracy Test Set Accuracy
Practical - 10
Aim: Compare various unsupervised learning algorithms using appropriate data.
Theory:
❖ What is Unsupervised Learning?
- Unsupervised learning is a type of machine learning where the algorithm does not have any
labeled training data. This means that the algorithm must learn to find patterns in the data
without any prior knowledge. Unsupervised learning is often used for tasks such as
clustering, dimensionality reduction, and anomaly detection. The goal is to uncover patterns,
structures, and relationships within data without the use of explicit labels or target values.
Unlike supervised learning, where the algorithm learns from labeled examples to make
predictions, unsupervised learning works with unlabeled data to find hidden insights,
groupings, or representations.
9.) Adjust Parameters: If necessary, adjust algorithm parameters to see how they impact the
clustering results. This can help you fine-tune the algorithms for optimal performance.
10.) Choose the Best Fit: Based on your evaluation and interpretation, select the algorithm that best
fits the characteristics of your dataset and the insights you seek to gain.
❖ Let's compare the various supervised learning algorithms on the Iris dataset:
1.) K-Means Clustering:
K-Means is a widely used clustering algorithm that aims to partition data points into K distinct
clusters. The algorithm follows these steps:
1. Initialization: K initial cluster centroids are randomly chosen from the data points or based on
some predefined criteria.
2. Assignment: Each data point is assigned to the cluster whose centroid is nearest to it. The
distance is often calculated using Euclidean distance.
3. Update: The centroids of the clusters are recalculated as the mean of the data points assigned
to each cluster.
4. Iteration: The assignment and update steps are iteratively performed until convergence
Pros:
- Simplicity: K-Means is easy to implement and computationally efficient.
- Scalability: It can handle large datasets effectively.
- Fast convergence: K-Means typically converges quickly.
Cons:
- Sensitive to initialization: The algorithm's performance can be influenced by the initial
placement of centroids.
- Assumes spherical clusters: K-Means assumes that clusters are spherical and equally sized,
which might not hold true for all datasets.
2.) Hierarchical Clustering: Hierarchical clustering builds a hierarchy of clusters by iteratively
merging or splitting clusters. It operates in two main strategies:
- Agglomerative: Start with each data pointseparate cluster and iteratively merge closest
clusters.
- Divisive: Start with all data points in cluster and iteratively split cluster into smaller clusters.
- The algorithm forms a tree-like structure, which visualizes the hierarchy of clusters.
Pros:
- No need to specify the number of clusters: Hierarchical clustering doesn't require you to
specify the number of clusters beforehand.
- Captures various scales: The algorithm can identify clusters at different scales.
Cons:
- Computationally intensive: Hierarchical clustering can be slow, especially on large datasets.
- Lack of flexibility: Once clusters are merged or split, it's difficult to reverse the process.
3.) DBSCAN (Density-Based Spatial Clustering of Applications with Noise):DBSCAN groups
data points based on their density in the feature space. It distinguishes between core points, border
points, and noise points:
- Core points: Data points with at least 'min_samples' points within a distance of 'eps' are
considered core points.
- Border points: Data points within the 'eps' distance of a core point but with border points.
- Noise points: Data points that are neither core nor border points.
Pros:
- Can identify arbitrary-shaped clusters: DBSCAN is effective clusters with complex shapes.
- Noise tolerance: It can automatically identify and ignore noise points.
Cons:
- Sensitive to hyperparameters: The 'eps' and 'min_samples' parameters need to be set
carefully.
- Difficulty with varying density: DBSCAN may struggle with clusters of varying densities.
4.) Gaussian Mixture Model (GMM):GMM is a probabilistic model that assumes data points are
generated from a mixture of Gaussian distributions. It uses an Expectation-Maximization (EM)
algorithm to estimate the parameters of the Gaussians and cluster assignments.
Pros:
- Flexible cluster shapes: GMM can model clusters of various shapes and sizes.
- Soft clustering: GMM assigns probabilities of data points belonging to clusters, providing a
measure of uncertainty.
Cons:
-Computationally intensive: GMM involves iterative EM steps and can be slower than other.
- Sensitive to initialization: Like K-Means, GMM's performance can be influenced by
initialization.
❖ Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
data = load_iris()
X = data.data[:, [0, 1]] # Using Sepal Length and Sepal Width features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
algorithms = {
'K-Means': KMeans(n_clusters=3),
'Hierarchical': AgglomerativeClustering(n_clusters=3),
'DBSCAN': DBSCAN(eps=0.5, min_samples=5),
'GMM': GaussianMixture(n_components=3)
}
print("===")
❖ Final Table:
Algorithm Training Set Silhouette Score Test Set Silhouette Score
K - Means 0.438871 0.438871
Hierarchial Clustering 0.438600 0.438600
DBSCAN 0.391959 0.391959
GMM 0.436065 0.436065
Table – 9.1: Silhouette Score comparision
Algorithm Training Set Precision Test Set Precision
K - Means 0.544401 0.544401
Hierarchial Clustering 0.793902 0.793902
DBSCAN 0.107981 0.107981
GMM 0.109656 0.109656
Table – 9.2: Precision Comparision
Algorithm Training Set Recall Test Set Recall
K - Means 0.546667 0.546667
Hierarchial Clustering 0.786667 0.786667
DBSCAN 0.306667 0.306667
GMM 0.100000 0.100000
Table – 9.3: Recall Comparision
Algorithm Training Set F1 Score Test Set F1 Score
K - Means 0.544467 0.544467
Hierarchial Clustering 0.786110 0.786110
DBSCAN 0.159722 0.159722
GMM 0.104535 0.104535
Table 9.4: F1 Score Comparison
❖ Final Output
❖ Conclusion:
In this comparative analysis of unsupervised learning algorithms using the Iris dataset, K-Means and
Hierarchical Clustering exhibited higher Silhouette Scores, indicating well-defined clusters, while
DBSCAN and Gaussian Mixture Models (GMM) performed relatively less optimally. When treated
as pseudo-labels, K-Means and Hierarchical Clustering demonstrated better precision, recall, and F1
scores, reflecting their ability to generate distinct clusters.