ML_final
ML_final
210170111094
ENTROLLMENT NO:210170111039
1|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Submitted by
NAKUM RUTURAJ
210170111094
BACHELOR OF ENGINEERING
In
2024-2025
2|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Vishwakarma Government
Engineering College, Chandkheda
3|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Certificate
Place:
Date:
4|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Preface
Main motto of any laboratory/practical/field work is for enhancing required skills as well as creating
ability amongst students to solve real time problem by developing relevant competencies in
psychomotor domain. By keeping in view, GTU has designed competency focused outcome-based
curriculum for engineering degree programs where sufficient weightage is given to practical work.
It shows importance of enhancement of skills amongst the students and it pays attention to utilize
every second of time allotted for practical amongst students, instructors and faculty members to
achieve relevant outcomes by performing the experiments rather than having merely study type
experiments. It is must for effective implementation of competency focused outcome-based
curriculum that every practical is keenly designed to serve as a tool to develop and enhance relevant
competency required by the various industry among every student. These psychomotor skills are
very difficult to develop through traditional chalk and board content delivery method in the
classroom. Accordingly, this lab manual is designed to focus on the industry defined relevant
outcomes, rather than old practice of conducting practical to prove concept and theory.
By using this lab manual students can go through the relevant theory and procedure in advance
before the actual performance which creates an interest and students can have basic idea prior to
performance. This in turn enhances pre-determined outcomes amongst students. Each experiment in
this manual begins with competency, course outcomes as well as practical outcomes (objectives).
The students will also achieve safety and necessary precautions to be taken while performing
practical.
This manual also provides guidelines to faculty members to facilitate student centric lab activities
through each experiment by arranging and managing necessary resources in order that the students
follow the procedures with required safety and necessary precautions to achieve the outcomes. It
also gives an idea that how students will be assessed by providing rubrics.
This lab manual is focuses on the development of Computer Programs for machine learning that can
change when exposed to new data. In this manual we’ll see basics of Machine Learning, and
implementation of a simple machine-learning algorithm using python.
Machine learning is a method of teaching computers to learn from data, without being explicitly
programmed. Python is a popular programming language for machine learning because it has a large
number of powerful libraries and frameworks that make it easy to implement machine learning
algorithms.
To get started with machine learning using Python, you will need to have a basic understanding of
Python programming and some knowledge of mathematical concepts such as probability, statistics,
and linear algebra.
Utmost care has been taken while preparing this lab manual however always there is chances of
improvement. Therefore, we welcome constructive suggestions for improvement and removal of
errors if any.
5|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Sr. CO CO CO CO CO
Aim / Objective(s) of Experiment
No. 1 2 3 4 5
1. Introduction to Libraries used for Machine Learning. √
Create dataset of 12 Samples each having 10 features.
2. Split it into training and testing dataset. √ √
6|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
The following industry relevant competency are expected to be developed in the student by
undertaking the practical work of this laboratory.
1.
2.
7|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Index
(Progressive Assessment Sheet)
8|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
9|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 1
TensorFlow Library:
Here is the first example of tensor Flow. It shows how you can define constants and perform
computation with those constants using the session.
# Import `tensorflow`
10 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
import tensorflow as tf
x1 = tf.constant([1,2,3,4])
x2 = tf.constant([5,6,7,8])
# Multiply
print(result)
Scikit-learn Library:
Scikit-learn is an open source machine learning library that supports supervised and
unsupervised learning. It also provides various tools for model fitting, data preprocessing,
model selection, model evaluation, and many other utilities. Scikit-learn provides dozens of
built-in machine learning algorithms and models, called estimators. Each estimator can be
fitted to some data using its fit method.
The library is built upon the SciPy (Scientific Python) that must be installed before you can
use scikit-learn. This stack that includes:
Extensions or modules for SciPy care conventionally named SciKits. As such, the module
provides learning algorithms and is named scikit-learn.
11 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Pytorch library:
PyTorch is a Python-based scientific computing package serving two broad purposes:
A replacement for NumPy to use the power of GPUs and other accelerators.
An automatic differentiation library that is useful to implement neural networks.
PyTorch is closely related to the lua-based Torch framework which is actively used in
Facebook.
Easy Interface: PyTorch offers easy to use API; hence it is considered to be very simple to
operate and runs on Python. The code execution in this framework is quite easy.
Python usage: This library is considered to be Pythonic which smoothly integrates
with the Python data science stack. Thus, it can leverage all the services and
functionalities offered by the Python environment.
Computational graphs: PyTorch provides an excellent platform which offers
dynamic computational graphs. Thus a user can change them during runtime. This is
highly useful when a developer has no idea of how much memory is required for creating a
neural network model.
Pytorch Tensors: Tensors are a specialized data structure that are very similar to arrays and
matrices. Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or
other specialized hardware to accelerate computing.
12 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Numpy is a popular open source library used for mathematical and scientific computing in
Python. Instead of reinventing the wheel, Pytorch interpolates really well with Numpy to
leverage its existing ecosystem of tools and libraries.
Here’s how we create array using numpy:
import numpy as np
>> x
array ( [ [1 . , 2.],
[3., 4. ]])
>> y
tensor ( [ [1., 2. ],
Conclusion:
13 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Question:
Rubrics:
14 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 2
Relevant CO: Comprehend basic concepts of neural network and its use in machine learning.
Objectives:
# Load the CSV file (make sure the file path is correct)
student_data = pd.read_csv('student_placement.csv')
15 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
OUTPUT:
16 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Rubrics:
Rubrics 1 2 3 4 5
17 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 3
Date:
Competency and Practical Skills: Python, Scikit-learn
Relevant CO: Learn and implement various basic machine learning algorithms.
Exploratory Data Analysis (EDA) is a technique to analyze data using some visual Techniques.
With this technique, we can get detailed information about the statistical summary of the data. We
will also be able to deal with the duplicates values, outliers, and also see some trends or patterns
present in the dataset
Example :
python
import pandas as pd
# Load the Placement
dataset df =
pd.read_csv('data.csv') #
Display the first 5 rows
print(df.head())
18 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
19 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
The describe() function applies basic statistical computations on the dataset like extreme
values, count of data points standard deviation, etc. Any missing value or NaN value is
automatically skipped. describe() function gives a good picture of the distribution of data.
Example:
110 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Checking Duplicates
Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates() method helps
in removing duplicates from the data frame.
Example:
20 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
We can see that there are only three unique species. Let’s see if the dataset is balanced or not
i.e. all the species contain equal amounts of rows or not. We will use
the Series.value_counts() function. This function returns a Series containing counts of unique
values.
Example:
Data Visualization
We will use Matplotlib and Seaborn library for the data visualization.
Example:
21 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
plt.show()
We will see the relationship between the CGPA and IQ and also between Resume score and
Internship
# Scatter plot with 'CGPA' on x-axis, 'Resume_Score' on y-axis, and 'Placed' as the hue
sns.scatterplot(x='CGPA',
y='IQ',
hue='Placed',
data=df)
plt.show()
Students with lower CGPA but higher IQ might still get placed.
Students with moderate CGPA and IQ lie in the middle of those who are placed or not.
Students with higher CGPA and lower IQ may or may not get placed depending on
other factors like Resume_Score or Internship.
# importing packages
22 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
# Scatter plot with 'CGPA' on x-axis, 'Resume_Score' on y-axis, and 'Placed' as the hue
plt.show()
Students with lower Resume Scores and no Internship experience (0) are less likely to
get placed.
Students with moderate Resume Scores and Internship experience (1) fall in the
middle in terms of placement likelihood.
Students with higher Resume Scores and Internship experience are more likely to get
placed.
Let’s plot all the column’s relationships using a pairplot. It can be used for multivariate
analysis.
Example:
hue='Placed', height=2)
plt.show()
23 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Histograms
Histograms allow seeing the distribution of data for various columns. It can be used for uni as
well as bi-variate analysis.
# importing packages
axes[0, 0].set_title("CGPA")
24 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
axes[1, 0].set_title("IQ")
axes[1, 1].set_title("Internship")
plt.tight_layout()
plt.show()
25 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
CGPA: The highest frequency of CGPA is between 7.0 and 8.0, indicating that most
students have CGPAs in this range.
Resume Score: The highest frequency of Resume Scores is around 80, suggesting
that a majority of students score within the range of 70 to 90.
IQ: The highest frequency of IQ scores is around 110, indicating that many students
have IQs in the range of 105 to 115.
Internship: The highest frequency of Internship experience is around 1.0, showing
that most students have completed internships.
Distplot is used basically for the univariant set of observations and visualizes it through a
histogram i.e. only one observation and hence we choose one particular column of the dataset.
Example:
# Importing packages
import matplotlib.pyplot as pl
plot.map(sns.histplot, "CGPA").add_legend()
plot.map(sns.histplot, "Resume_Score").add_legend()
plot.map(sns.histplot, "IQ").add_legend()
plot.map(sns.histplot, "Internship").add_legend()
26 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
plt.show()
27 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Resume Score: Similar to CGPA, the Resume Score distributions show considerable
overlap, suggesting that it is also not a decisive factor in placement outcomes.
IQ: There is a noticeable amount of overlap, but the distributions for placed and
unplaced students begin to diverge slightly, indicating that IQ may play a role in
placement decisions.
Internship: There is a distinct separation between the distributions for placed and
unplaced students, suggesting that having an internship significantly impacts
placement status.
Handling Correlation
Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the
dataframe. Any NA values are automatically excluded. For any non-numeric data type
columns in the dataframe it is ignored.
Heatmaps
The heatmap is a data visualization technique that is used to analyze the dataset as colors in
two dimensions. Basically, it shows a correlation between all numerical variables in the
dataset. In simpler terms, we can plot the above-found correlation using the heatmaps.
Example:
# Importing packages
28 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
plt.show()
29 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Box Plots
We can use boxplots to see how the categorical value os distributed with other numerical
values.
Example:
# importing packages
def graph(y):
plt.figure(figsize=(10,10))
# grid position
plt.subplot(221)
graph('CGPA')
plt.subplot(222)
graph('IQ')
plt.subplot(223)
graph('Resume_Score')
plt.subplot(224)
graph('Internship')
plt.show()
30 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
31 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Let’s consider the placement dataset and let’s plot the boxplot for the CGPA column.
Example:
# importing packages
df = pd.read_csv('data.csv')
sns.boxplot(x='CGPA', data=df)
import numpy as np
import pandas as pd
32 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
# df = pd.read_csv('your_dataset.csv')
IQR = Q3 - Q1
df.drop(upper[0], inplace=True)
df.drop(lower[0], inplace=True
sns.boxplot(x='CGPA', data=df)
plt.show()
33 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Question :
1. What is EDA?
2. Write a code for comparing Sepal length and sepal width.
3. Expalain Histogram.
4. Define the following function:(I) Heatmaps (ii)Box plot (iii)describe (iv)
checking duplicates.
5. Explain library used for data visualization
Rubrics:
Rubrics 1 2 3 4 5
34 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Objectives:
210170111094
36 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
37 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
3. Find coefficient, mean squared error and variance score of linear regression
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load your dataset
data = pd.read_csv('data.csv')
# Step 2: Extract CGPA (independent variable) and Placed (dependent variable)
X = data['CGPA'].values.reshape(-1, 1) # Reshape to 2D array for sklearn
y = data['Placed'].values
# Step 3: Create and train the linear regression model
model = LinearRegression()
model.fit(X, y)
# Step 4: Make predictions using the trained model
y_pred = model.predict(X)
# Step 5: Plot the scatter plot and regression line
plt.scatter(X, y, color='blue', label='Data points') # Original data points
plt.plot(X, y_pred, color='red', label='Regression Line') # Regression line
plt.xlabel('CGPA (Independent Variable)')
38 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
39 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
310 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Question:
Rubrics 1 2 3 4 5
311 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 5
40 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
40 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Question:
Rubrics:
Rubrics 1 2 3 4 5
41 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 6
2. Plot tree.
Code:
data = pd.read_csv('data.csv')
print("Dataset Preview:")
print(data.head())
X = data[['CGPA', 'IQ']]
y = data['Placed']
42 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
decision_tree.fit(X_train, y_train)
# Make predictions
y_pred_tree = decision_tree.predict(X_test)
cm = confusion_matrix(y_test, y_pred_tree)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(cmap=plt.cm.Greens)
plt.show()
plt.figure(figsize=(12, 8))
plt.title('Decision Tree
Structure') plt.show()
43 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
# Example: [ 8.9, 129] => Replace with actual input values based on your dataset
prediction = decision_tree.predict(new_student)
Output:
44 | P a g e
Introduction to Machine Learning (3171114) NISHI GOHIL
210170111039
Conclusion:
Question:
1. Explain decision tree algorithm.
2. Explain classification of machine learning.
3. Define confusion matrix and its importance.
4. How can we plot decision tree using python.
Rubrics:
Rubrics 1 2 3 4 5
45 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 7
Code:
import pandas as pd
import numpy as np
# Assuming 'Placed' is the target variable and all others are features
categorical_cols = data.select_dtypes(include=['object']).columns.tolist()
46 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
# Step 1: Split the dataset into training and testing sets (80% train, 20% test)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Accuracy:", accuracy)
plt.figure(figsize=(8, 6))
ConfusionMatrixDisplay(confusion_matrix=conf_matrix).plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
47 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
plt.show()
OUTPUT:
Conclusion:
Question:
1. What is logistic regression?
2. Compare linear and logistic regression.
3. What are the different types of logistic regression?
4. What are the advantage of logistic regression?
Rubrics:
Rubrics 1 2 3 4 5
48 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 8
Code:
49 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
50 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
50 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111039
Conclusion:
Question:
1. Explain support vector Machine algorithm.
2. Explain different types of kernal functions.
3. What do know about hard margin SVM and Soft margin SVM?
4. What is hinge loss?
Rubrics:
Rubrics 1 2 3 4 5
51 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 9
Code:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# Load the dataset from the uploaded CSV file
file_path = 'data.csv' # Update the file path if necessary
iris_df = pd.read_csv(file_path)
# Check for non-numeric columns and display the DataFrame info
print(iris_df.info())
# Convert categorical columns to numeric using one-hot encoding
iris_df = pd.get_dummies(iris_df, drop_first=True)
# Split dataset into features (X) and target (y)
X = iris_df.drop('Placed', axis=1) # Assuming 'Placed' is the target variable
y = iris_df['Placed'] # Target variable
# Step 1: Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
52 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
# Step 2: Create and train the SVM classifier with an RBF kernel
svm_classifier_rbf = SVC(kernel='rbf', random_state=42)
svm_classifier_rbf.fit(X_train, y_train)
# Step 3: Make predictions
y_pred = svm_classifier_rbf.predict(X_test)
# Step 4: Generate classification report, confusion matrix, and accuracy score
classification_rep = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
# Print the results
print("Classification Report:\n", classification_rep)
print("Confusion Matrix:\n", conf_matrix)
print("Accuracy:", accuracy)
# Step 5: Plotting the decision boundary and scatter plot
# For visualization, we will use only the first two
features
X_vis = X.iloc[:, :2] # Using the first two features for 2D visualization
# Train the SVM model on the reduced dataset
svm_classifier_vis = SVC(kernel='rbf', random_state=42)
svm_classifier_vis.fit(X_vis, y)
# Create a mesh grid for plotting decision
boundaries h = .02 # step size in the mesh
x_min, x_max = X_vis.iloc[:, 0].min() - 1, X_vis.iloc[:, 0].max() + 1
y_min, y_max = X_vis.iloc[:, 1].min() - 1, X_vis.iloc[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Plot the decision boundary
Z = svm_classifier_vis.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(12, 6))
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X_vis.iloc[:, 0], X_vis.iloc[:, 1], c=y, edgecolors='k', marker='o')
plt.title('SVM Classifier with RBF Kernel')
53 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
54 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Question:
1. What are some application of SVM?
2. List out some advantage of SVM?
3. What is hyperplane in SVM?
Rubrics:
Rubrics 1 2 3 4 5
55 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 10
Code:
data_df = pd.read_csv(file_path)
print(data_df.head())
56 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
scaler = StandardScaler()
plt.figure(figsize=(10, 6))
plt.show()
# Applying PCA
data_pca = pca.fit_transform(scaled_data)
plt.figure(figsize=(10, 6))
plt.show()
print(data_pca.head())
OUTPUT:
57 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
2101701
2111094210170111039
Conclusion:
Question:
1. What is Dimensionality Reduction?
2. What is PCA? What does PCA do?
3. What are the advantages of Dimensionality Reduction.
4. List down the steps of a PCA algorithm.
58 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 11
Code:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
# Load the dataset
file_path = 'data.csv' # Update with your actual file path
data = pd.read_csv(file_path)
# Display the first few rows of the dataset
print(data.head())
# Define features and target variable
X = data[['CGPA', 'IQ', 'Resume_Score', 'Internship']] # Features
y = data['Placed'] # Target variable
# Split dataset into training set and test set (70% training and 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
59 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
510 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Question:
1. What do mean by random forest algorithm?
2. What do you mean by bagging?
3. What does random refer to in ‘Random Forest’?
4. List down the advantages and disadvantages of Random Forest algorithm.
Rubrics:
Rubrics 1 2 3 4 5
60 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 12
K means Clustering
DATE:
Code:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import MinMaxScaler
# Reading the dataset
file_path = 'data.csv' # Update the file path to your dataset
data = pd.read_csv(file_path)
# Display information about the dataset
print(data.info())
print(data.head(10))
# Frequency distribution of 'Placed' (assuming 'Placed' is the categorical outcome)
outcome_distribution = pd.crosstab(index=data["Placed"], columns="count")
print(outcome_distribution)
# If you want to separate the data based on 'Placed', do it like this
61 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
placed = data[data["Placed"] == 1]
not_placed = data[data["Placed"] == 0]
# Drop non-numeric columns for correlation analysis
data_numeric = data.select_dtypes(include=[np.number]) # Select only numeric columns
# Check if the numeric data is empty
if data_numeric.empty:
print("No numeric columns available for correlation analysis.")
else:
# Visualize correlation using a heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(data_numeric.corr(), cmap='Blues', annot=True)
plt.title('Correlation Heatmap')
plt.show()
# Prepare data for clustering (if desired)
# Selecting only the relevant features for clustering
x = data[['CGPA', 'IQ', 'Resume_Score', 'Internship']].values
# Normalize the features
scaler = MinMaxScaler()
x_scaled =
scaler.fit_transform(x) # KMeans
clustering
kmeans = KMeans(n_clusters=2, random_state=42) # Set n_clusters based on your data
kmeans.fit(x_scaled)
data['Cluster'] = kmeans.labels_
# Calculate silhouette score
silhouette_avg = silhouette_score(x_scaled, kmeans.labels_)
print("Silhouette Score:", silhouette_avg)
# Plotting the clusters
plt.figure(figsize=(10, 6))
plt.scatter(x_scaled[:, 0], x_scaled[:, 1], c=data['Cluster'], cmap='viridis', marker='o')
plt.title('Clusters of Students')
62 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
63 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
plt.xlabel('Feature 1 (CGPA)')
plt.ylabel('Feature 2 (IQ)')
plt.colorbar(label='Cluster')
plt.grid()
plt.show()
OUTPUT:
64 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Question:
1. Explain K means Clustering Algorithm.
2. Why do you prefer Euclidean distance over Manhattan distance in the K means algorithm.
3. List out advantage and disadvantage of K means clustering.
4. Difference between K means clustering and KNN algorithm.
Rubrics:
Rubrics 1 2 3 4 5
65 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 13
2. Find accuracy.
Code:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.metrics import confusion_matrix
# Load the dataset
file_path = 'placed.csv' # Update the file path to your dataset
data = pd.read_csv(file_path)
# Display the first few rows of the dataset
print(data.head())
# Assuming the target variable is 'Placed'
X = data.drop('Placed', axis=1) # Features (excluding the target variable)
y = data['Placed'] # Target variable
# Splitting X and y into training and testing sets (60% train, 40% test)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
66 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Question:
1. What is Naive Bayes?Why it is called “Naive “ Bayes algorithm.?
2. List out the important characteristics of naive Bayes.
3. What are the main types of Naive Bayes Classifiers?
4. List out advantages, disadvantages and limitation of Naive bayes algorithm.
Rubrics:
Rubrics 1 2 3 4 5
67 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 14
Activation Function
DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Comprehend basic concepts of Neural network and its use in machine
learning.
Objectives:
68 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
plt.subplot(3, 1, 1)
plt.plot(x, sigmoid_output, label='Sigmoid', color='blue')
plt.title('Sigmoid Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid()
plt.axhline(0, color='black', lw=0.5, ls='--')
plt.axvline(0, color='black', lw=0.5, ls='--')
plt.legend()
# ReLU
plt.subplot(3, 1, 2)
plt.plot(x, relu_output, label='ReLU', color='orange')
plt.title('ReLU Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid()
plt.axhline(0, color='black', lw=0.5, ls='--')
plt.axvline(0, color='black', lw=0.5, ls='--')
plt.legend()
# Tanh
plt.subplot(3, 1, 3)
plt.plot(x, tanh_output, label='Tanh', color='green')
plt.title('Tanh Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid()
plt.axhline(0, color='black', lw=0.5, ls='--')
plt.axvline(0, color='black', lw=0.5, ls='--')
plt.legend()
plt.tight_layout()
69 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
plt.show()
OUTPUT:
Conclusion:
Question:
1. What is the role of the activation functions in Neural Networks?
2. List down the names of some popular activation function in neural networks.
3. What is the difference between forward propagation and backward propagation in neural
networks?
4. Why is ReLU the most commonly used activation function.
Rubrics:
Rubrics 1 2 3 4 5
610 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
EXPERIMENT NO: 15
70 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
70 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=64)
# Evaluate the model and print total parameters
test_loss, test_acc = model.evaluate(x_test, y_test)
total_params = model.count_params()
print(f"Test accuracy: {test_acc:.4f}")
print(f"Total parameters in the CNN: {total_params}")
OUTPUT:
71 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ
210170111094
Conclusion:
Question:
1. What do you mean by Convolutional Neural Network ?
2. List out different layer for in CNN.
3. Briefly explain the two major steps of CNN.(1.Feature learning and 2.Classification)
4. Explain the significance of “Parameter sharing” and “sparsity of connection” in CNN.
Rubrics:
Rubrics 1 2 3 4 5
72 | P a g e