0% found this document useful (0 votes)

72 views

aiml manual 6th sem

The document outlines a practical course in Machine Learning (BAIL606) for semester 6, detailing course objectives, experiments, and assessment methods. Students will learn to visualize data, implement various machine learning algorithms, and evaluate models using metrics like accuracy and F1-score. The assessment comprises Continuous Internal Evaluation (CIE) and Semester End Exam (SEE), each contributing 50% to the final grade.

Uploaded by

shreekd2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views

aiml manual 6th sem

Uploaded by

shreekd2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Artificial Intelligence & Machine Learning

Machine Learning lab (BAIL606)

Template for Practical Course and if AEC is a practical Course Annexure-V

Machine Learning lab Semester 6

Course Code BAIL606 CIE Marks 50
Teaching Hours/Week (L:T:P: S) 0:0:2:0 SEE Marks 50
Credits 01 Exam Hours 100
Examination type (SEE) Practical
Course objectives:
• To become familiar with data and visualize univariate, bivariate, and multivariate data using statistical
techniques and dimensionality reduction.
• To understand various machine learning algorithms such as similarity-based learning, regression, decision
trees, and clustering.
• To familiarize with learning theories, probability-based models and developing the skills required for
decision-making in dynamic environments.
Sl.NO Experiments
1 Develop a program to Load a dataset and select one numerical column. Compute mean, median, mode,
standard deviation, variance, and range for a given numerical column in a dataset. Generate a histogram and
boxplot to understand the distribution of the data. Identify any outliers in the data using IQR. Select a
categorical variable from a dataset. Compute the frequency of each category and display it as a bar chart or
pie chart.

2 Develop a program to Load a dataset with at least two numerical columns (e.g., Iris, Titanic). Plot a scatter
plot of two variables and calculate their Pearson correlation coefficient. Write a program to compute the
covariance and correlation matrix for a dataset. Visualize the correlation matrix using a heatmap to know
which variables have strong positive/negative correlations.

3 Develop a program to implement Principal Component Analysis (PCA) for reducing the dimensionality of
the Iris dataset from 4 features to 2.

4 Develop a program to load the Iris dataset. Implement the k-Nearest Neighbors (k-NN) algorithm for
classifying flowers based on their features. Split the dataset into training and testing sets and evaluate the
model using metrics like accuracy and F1-score. Test it for different values of 𝑘 (e.g., k=1,3,5) and evaluate
the accuracy. Extend the k-NN algorithm to assign weights based on the distance of neighbors (e.g.,
𝑤𝑒𝑖𝑔ℎ𝑡=1/𝑑2 ). Compare the performance of weighted k-NN and regular k-NN on a synthetic or real-world
dataset.

6 Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs.

7 Develop a program to demonstrate the working of Linear Regression and Polynomial Regression. Use
Boston Housing Dataset for Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency prediction)
for Polynomial Regression.

8 Develop a program to load the Titanic dataset. Split the data into training and test sets. Train a decision tree
classifier. Visualize the tree structure. Evaluate accuracy, precision, recall, and F1-score.

9 Develop a program to implement the Naive Bayesian classifier considering Iris dataset for training. Compute
the accuracy of the classifier, considering the test data.

10 Develop a program to implement k-means clustering using Wisconsin Breast Cancer data set and visualize
the clustering result.
Template for Practical Course and if AEC is a practical Course Annexure-V

Course outcomes (Course Skill Set):

At the end of the course the student will be able to:
● Illustrate the principles of multivariate data and apply dimensionality reduction techniques.
● Demonstrate similarity-based learning methods and perform regression analysis.
● Develop decision trees for classification and regression problems, and Bayesian models for probabilistic
learning.
• Implement the clustering algorithms to share computing resources.
Assessment Details (both CIE and SEE)
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
The minimum passing mark for the CIE is 40% of the maximum marks (20 marks out of 50) and for the
SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures a minimum of 40% (40 marks out of 100) in the sum total of the CIE
(Continuous Internal Evaluation) and SEE (Semester End Examination) taken together

Continuous Internal Evaluation (CIE):

CIE marks for the practical course are 50 Marks.
The split-up of CIE marks for record/ journal and test are in the ratio 60:40.
• Each experiment is to be evaluated for conduction with an observation sheet and record
write-up. Rubrics for the evaluation of the journal/write-up for hardware/software
experiments are designed by the faculty who is handling the laboratory session and are
made known to students at the beginning of the practical session.
• Record should contain all the specified experiments in the syllabus and each experiment
write-up will be evaluated for 10 marks.
• Total marks scored by the students are scaled down to 30 marks (60% of maximum
marks).
• Weightage to be given for neatness and submission of record/write-up on time.
• Department shall conduct a test of 100 marks after the completion of all the experiments
listed in the syllabus.
• In a test, test write-up, conduction of experiment, acceptable result, and procedural
knowledge will carry a weightage of 60% and the rest 40% for viva-voce.
• The suitable rubrics can be designed to evaluate each student’s performance and learning
ability.
• The marks scored shall be scaled down to 20 marks (40% of the maximum marks).
The Sum of scaled-down marks scored in the report write-up/journal and marks of a test is the
total CIE marks scored by the student.
Semester End Evaluation (SEE):
• SEE marks for the practical course are 50 Marks.
• SEE shall be conducted jointly by the two examiners of the same institute, examiners are
appointed by the Head of the Institute.
Template for Practical Course and if AEC is a practical Course Annexure-V

• The examination schedule and names of examiners are informed to the university before
the conduction of the examination. These practical examinations are to be conducted
between the schedule mentioned in the academic calendar of the University.
• All laboratory experiments are to be included for practical examination.
• (Rubrics) Breakup of marks and the instructions printed on the cover page of the answer
script to be strictly adhered to by the examiners. OR based on the course requirement
evaluation rubrics shall be decided jointly by examiners.
• Students can pick one question (experiment) from the questions lot prepared by the
examiners jointly.
• Evaluation of test write-up/ conduction procedure and result/viva will be conducted
jointly by examiners.
• General rubrics suggested for SEE are mentioned here, writeup-20%, Conduction procedure
and result in -60%, Viva-voce 20% of maximum marks. SEE for practical shall be evaluated for
100 marks and scored marks shall be scaled down to 50 marks (however, based on course
type, rubrics shall be decided by the examiners)
Change of experiment is allowed only once and 15% of Marks allotted to the procedure part
are to be made zero.
The minimum duration of SEE is 02 hours
Suggested Learning Resources:
Books:

1. S Sridhar and M Vijayalakshmi, “Machine Learning”, Oxford University Press, 2021.

2. M N Murty and Ananthanarayana V S, “Machine Learning: Theory and Practice”, Universities Press (India)
Pvt. Limited, 2024.

Web links and Video Lectures (e-Resources):

● https://www.drssridhar.com/?page_id=1053
● https://www.universitiespress.com/resources?id=9789393330697
● https://onlinecourses.nptel.ac.in/noc23_cs18/preview
Experiment 1:
Develop a program to Load a dataset and select one numerical column. Compute mean, median,
mode, standard deviation, variance, and range for a given numerical column in a dataset. Generate
a histogram and boxplot to understand the distribution of the data. Identify any outliers in the data
using IQR. Select a categorical variable from a dataset. Compute the frequency of each category
and display it as a bar chart or pie chart.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load dataset
df = pd.read_csv('pgm1.csv')
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(df.head())
# Data Cleaning and Preprocessing
df = df.dropna() # Remove rows with missing values
# Numerical Analysis on Scores
score_columns = ['math score', 'reading score', 'writing score']
# Compute statistics
for column in score_columns:
mean = df[column].mean()
median = df[column].median()
mode = df[column].mode()[0]
std_dev = df[column].std()
variance = df[column].var()
data_range = df[column].max() - df[column].min()
# Display statistics
print(f'\nStatistics for {column}:')
print(f'Mean: {mean:.2f}')
print(f'Median: {median:.2f}')
print(f'Mode: {mode:.2f}')
print(f'Standard Deviation: {std_dev:.2f}')
print(f'Variance: {variance:.2f}')
print(f'Range: {data_range:.2f}')
# Generate histogram
plt.figure(figsize=(10, 5))
plt.hist(df[column], bins=10, color='blue', alpha=0.7)
plt.title(f'Histogram of {column}')
plt.xlabel(column)
plt.ylabel('Frequency')
plt.grid(axis='y')
plt.show()
# Generate boxplot
plt.figure(figsize=(10, 5))
sns.boxplot(x=df[column])
plt.title(f'Boxplot of {column}')
plt.show()
# Identify outliers using IQR
Q1 = df[column].quantile(0.25) 1
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
print(f'Outliers in {column}:')
print(outliers)
# Categorical Analysis on Gender
categorical_column = 'gender'
# Compute frequency of each category
frequency = df[categorical_column].value_counts()
# Display frequency as bar chart
plt.figure(figsize=(10, 5))
frequency.plot(kind='bar', color='orange')
plt.title(f'Frequency of Students by {categorical_column}')
plt.xlabel(categorical_column)
plt.ylabel('Frequency')
plt.show()
# Display frequency as pie chart
plt.figure(figsize=(8, 8))
frequency.plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title(f'Distribution of Students by {categorical_column}')
plt.ylabel('')
plt.show()

2
Experiment 2:
Develop a program to Load a dataset with at least two numerical columns (e.g., Iris, Titanic). Plot
a scatter plot of two variables and calculate their Pearson correlation coefficient. Write a program
to compute the covariance and correlation matrix for a dataset. Visualize the correlation matrix
using a heatmap to know which variables have strong positive/negative correlations.

import seaborn as sns

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Load dataset from CSV file
df = pd.read_csv('pgm2.csv') # Replace 'dataset.csv' with the actual filename
# Ensure only numerical columns are selected
df = df.select_dtypes(include=[np.number]).dropna()
# Select two numerical columns for scatter plot
if df.shape[1] < 2:
raise ValueError("Dataset must have at least two numerical columns.")
x_col = df.columns[0] # First numerical column
y_col = df.columns[1] # Second numerical column
# Scatter plot
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df[x_col], y=df[y_col])
plt.xlabel(x_col)
plt.ylabel(y_col)
plt.title(f'Scatter Plot of {x_col} vs {y_col}')
plt.show()
# Compute Pearson correlation coefficient
if df[x_col].nunique() > 1 and df[y_col].nunique() > 1:
pearson_corr = np.corrcoef(df[x_col], df[y_col])[0, 1]
print(f'Pearson Correlation Coefficient between {x_col} and {y_col}: {pearson_corr:.2f}')
else:
print(f'Cannot compute Pearson correlation coefficient as one of the columns has only one
unique value.')
# Compute covariance matrix
cov_matrix = df.cov()
print("Covariance Matrix:")
print(cov_matrix)
# Compute correlation matrix
corr_matrix = df.corr()
print("Correlation Matrix:")
print(corr_matrix)
# Visualize correlation matrix using heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Matrix Heatmap')
plt.show()

3
Experiment 3
Develop a program to implement Principal Component Analysis (PCA) for reducing the
dimensionality of the Iris dataset from 4 features to 2.

import matplotlib.pyplot as plt

import seaborn as sns
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Load dataset from CSV file
df = pd.read_csv('pgm3.csv') # Replace 'dataset.csv' with the actual filename
# Ensure only numerical columns are selected
df_numeric = df.select_dtypes(include=[np.number]).dropna()
# Standardizing the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df_numeric)
# Apply PCA to reduce dimensions from 4 to 2
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Convert to DataFrame for easier handling
df_pca = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])
# Plot the PCA-transformed data
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df_pca['PC1'], y=df_pca['PC2'])
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA of Dataset')
plt.show()

4
Experiment 4:
Develop a program to load the Iris dataset. Implement the k-Nearest Neighbours (k-NN) algorithm
for classifying flowers based on their features. Split the dataset into training and testing sets and
evaluate the model using metrics like accuracy and F1-score. Test it for different values of 𝑘 (e.g.,
k=1,3,5) and evaluate the accuracy. Extend the k-NN algorithm to assign weights based on the
distance of neighbours (e.g., 𝑤𝑒𝑖𝑔ℎ𝑡=1/𝑑2 ). Compare the performance of weighted k-NN and
regular k-NN on a synthetic or real-world dataset.

import pandas as pd
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
def load_data(csv_file):
if not os.path.exists(csv_file):
raise FileNotFoundError(f"Error: The file '{csv_file}' was not found. Please check the
filename and path.")
data = pd.read_csv(csv_file)
X = data.iloc[:, :-1].values # Features
y = data.iloc[:, -1].values # Target
return train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
def train_knn(X_train, y_train, X_test, y_test, k, weighted=False):
knn = KNeighborsClassifier(n_neighbors=k, weights='distance' if weighted else 'uniform')
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
return accuracy_score(y_test, y_pred)
def main():
csv_file = "iris_data.csv" # Ensure the file is in the same directory
X_train, X_test, y_train, y_test = load_data(csv_file)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
for k in [1, 3, 5, 7, 9]:
print(f"k={k}, Accuracy (Regular): {train_knn(X_train, y_train, X_test, y_test, k):.4f}")
print(f"k={k}, Accuracy (Weighted): {train_knn(X_train, y_train, X_test, y_test, k,
weighted=True):.4f}")
if __name__ == "__main__":
main()

5
Experiment 5:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points.
Select appropriate data set for your experiment and draw graphs

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
def gaussian_kernel(X, x_query, tau):
"""Compute Gaussian weights for all training points relative to x_query."""
weights = np.exp(-np.square(X[:, 1] - x_query[1]) / (2 * tau ** 2))
return np.diag(weights) # Convert to diagonal weight matrix
def locally_weighted_regression(X_train, y_train, x_query, tau):
"""Compute LWR prediction for a single query point x_query."""
W = gaussian_kernel(X_train, x_query, tau) # Correctly compute weights
theta = np.linalg.pinv(X_train.T @ W @ X_train) @ (X_train.T @ W @ y_train)
return x_query @ theta # Return prediction
def predict_lwr(X_train, y_train, X_test, tau):
"""Compute LWR predictions for multiple query points."""
return np.array([locally_weighted_regression(X_train, y_train, x, tau) for x in X_test])
# Load dataset from CSV
data = pd.read_csv("data.csv")
X = data["X"].values.reshape(-1, 1)
y = data["y"].values.reshape(-1, 1)
# Add bias term to X
X_bias = np.hstack([np.ones_like(X), X])
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X_bias, y, test_size=0.2, random_state=42)
# Define tau (bandwidth parameter)
tau = 0.5
# Compute predictions
y_pred = predict_lwr(X_train, y_train, X_test, tau)
# Plot results
plt.scatter(X, y, label="Data", color="blue", alpha=0.5)
X_test_sorted = X_test[:, 1].argsort()
plt.plot(X_test[:, 1][X_test_sorted], y_pred[X_test_sorted], label="LWR Fit", color="red")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.title("Locally Weighted Regression (LWR)")
plt.show()

6
Experiment 6:
Develop a program to demonstrate the working of Linear Regression and Polynomial Regression.
Use Boston Housing Dataset for Linear Regression and Auto MPG Dataset (for vehicle fuel
efficiency prediction) for Polynomial Regression.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
# Function to clean Auto MPG dataset (handling missing or non-numeric values)
def clean_auto_mpg_data(df):
# Convert 'horsepower' column to numeric, coercing errors to NaN
df['horsepower'] = pd.to_numeric(df['horsepower'], errors='coerce')
# Fill missing values in 'horsepower' column with the mean of the column
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].mean())
return df
# --- Boston Housing Dataset ---
# Load Boston Housing Dataset (assuming the file is in the same directory)
boston_df = pd.read_csv("boston_housing.csv")
# Selecting average number of rooms (RM) as the feature and price (PRICE) as the target
X_boston = boston_df[['RM']].values
y_boston = boston_df['PRICE'].values
# Split the dataset into training and testing sets
X_train_boston, X_test_boston, y_train_boston, y_test_boston = train_test_split(X_boston,
y_boston, test_size=0.2, random_state=42)
# Linear Regression for Boston Housing Dataset
linear_model = LinearRegression()
linear_model.fit(X_train_boston, y_train_boston)
# Predictions for the test set
y_pred_boston = linear_model.predict(X_test_boston)
# Plotting Linear Regression results
plt.scatter(X_test_boston, y_test_boston, color='blue', label='Actual')
plt.plot(X_test_boston, y_pred_boston, color='red', label='Predicted')
plt.xlabel("Average number of rooms (RM)")
plt.ylabel("House Price")
plt.title("Linear Regression - Boston Housing Dataset")
plt.legend()
plt.show()
# Print Mean Squared Error for Linear Regression on Boston dataset
print("Boston Housing Linear Regression MSE:", mean_squared_error(y_test_boston,
y_pred_boston))
# --- Auto MPG Dataset ---
# Load Auto MPG Dataset
auto_mpg_df = pd.read_csv("auto_mpg.csv")
# Clean the dataset
auto_mpg_df = clean_auto_mpg_data(auto_mpg_df)
# Selecting 'horsepower' as the feature and 'mpg' as the target
X_auto = auto_mpg_df[['horsepower']].values 7
y_auto = auto_mpg_df['mpg'].values
# Split the dataset into training and testing sets
X_train_auto, X_test_auto, y_train_auto, y_test_auto = train_test_split(X_auto, y_auto,
test_size=0.2, random_state=42)
# Polynomial Regression for Auto MPG Dataset
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_auto)
X_test_poly = poly.transform(X_test_auto)
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train_auto)
# Predictions for the test set
y_poly_pred = poly_model.predict(X_test_poly)
# Plotting Polynomial Regression results
plt.scatter(X_test_auto, y_test_auto, color='blue', label='Actual')
plt.scatter(X_test_auto, y_poly_pred, color='red', label='Predicted')
plt.xlabel("Horsepower")
plt.ylabel("MPG")
plt.title("Polynomial Regression - Auto MPG Dataset")
plt.legend()
plt.show()
# Print Mean Squared Error for Polynomial Regression on Auto MPG dataset
print("Auto MPG Polynomial Regression MSE:", mean_squared_error(y_test_auto, y_poly_pred))

8
Experiment 7:
Develop a program to load the Titanic dataset. Split the data into training and test sets. Train a
decision tree classifier. Visualize the tree structure. Evaluate accuracy, precision, recall, and F1-
score.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelEncoder
# Load dataset
data=pd.read_csv("pgm7.csv")
# Selecting relevant features and handling missing values
data = data[['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']]
data.dropna(inplace=True)
# Encoding categorical variables
le_sex = LabelEncoder()
le_embarked = LabelEncoder()
data['Sex'] = le_sex.fit_transform(data['Sex'])
data['Embarked'] = le_embarked.fit_transform(data['Embarked'])
# Splitting data into features and target variable
X = data.drop(columns=['Survived'])
y = data['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train decision tree classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# Visualizing the tree structure
plt.figure(figsize=(15, 10))
plot_tree(clf, feature_names=X.columns, class_names=['Not Survived', 'Survived'], filled=True)
plt.show()
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-score: {f1:.2f}")

9
Experiment 8:
Develop a program to implement the Naive Bayesian classifier considering Iris dataset for
training. Compute the accuracy of the classifier, considering the test data.

# Import necessary libraries

import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load training and test datasets (assuming they are CSV files)
train_data = pd.read_csv('train_data.csv') # Training data file
test_data = pd.read_csv('test_data.csv') # Test data file
# Separate features (X) and labels (y) for training and test datasets
X_train = train_data.drop(columns=['target']) # Features for training data
y_train = train_data['target'] # Labels for training data
X_test = test_data.drop(columns=['target']) # Features for test data
y_test = test_data['target'] # Labels for test data
# Initialize the Naive Bayes classifier
nb_classifier = GaussianNB()
# Train the classifier with the training data
nb_classifier.fit(X_train, y_train)
# Predict the labels on the test set
y_pred = nb_classifier.predict(X_test)
# Compute the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
# Print the accuracy
print(f"Accuracy of Naive Bayes classifier on test data: {accuracy * 100:.2f}%")

10
Experiment 9:
0 Develop a program to implement k-means clustering using Wisconsin Breast Cancer data set
and visualize the clustering result.

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler, LabelEncoder
# Set environment variable to avoid memory leak warning on Windows
os.environ["OMP_NUM_THREADS"] = "3"
# Load the dataset from a CSV file
df = pd.read_csv('breast_cancer_data.csv')
# Encode categorical columns if present
for col in df.select_dtypes(include=['object']).columns:
df[col] = LabelEncoder().fit_transform(df[col])
# Assume the last column is the target (drop it for clustering)
df_features = df.iloc[:, :-1]
# Standardize the data
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df_features)
# Apply K-Means Clustering
kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
kmeans.fit(df_scaled)
labels = kmeans.labels_
# Reduce dimensions using PCA for visualization
pca = PCA(n_components=2)
df_pca = pca.fit_transform(df_scaled)
# Scatter plot of clusters
plt.figure(figsize=(8, 6))
plt.scatter(df_pca[:, 0], df_pca[:, 1], c=labels, cmap='viridis', alpha=0.6)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('K-Means Clustering on Breast Cancer Dataset')
plt.colorbar(label='Cluster')
plt.show()

Response Surface Methodology Process and Product Optimization Using Designed Experiments Fourth Edition Anderson-Cook All Chapter Instant Download
100% (2)
Response Surface Methodology Process and Product Optimization Using Designed Experiments Fourth Edition Anderson-Cook All Chapter Instant Download
52 pages
21cs54 Tie Simp
No ratings yet
21cs54 Tie Simp
5 pages
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
Machine_Learning_Lab
No ratings yet
Machine_Learning_Lab
46 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Module 4 - Probability Reasoning and Uncertainty
No ratings yet
Module 4 - Probability Reasoning and Uncertainty
80 pages
4.AIML - To Extract Features From Given Data Set and Establish Training Data
No ratings yet
4.AIML - To Extract Features From Given Data Set and Establish Training Data
2 pages
Instant Download (Ebook) System modeling and simulation: an introduction by Frank L. Severance ISBN 0471496944 PDF All Chapters
100% (1)
Instant Download (Ebook) System modeling and simulation: an introduction by Frank L. Severance ISBN 0471496944 PDF All Chapters
81 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
ML - Unit 2
No ratings yet
ML - Unit 2
15 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Lesson 2 - Univariate Statistics and Experimental Design
No ratings yet
Lesson 2 - Univariate Statistics and Experimental Design
34 pages
Instructions For Physics Practical Exam
No ratings yet
Instructions For Physics Practical Exam
2 pages
Lab Manual 15
No ratings yet
Lab Manual 15
9 pages
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
No ratings yet
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
3 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Data Science and Big Data Analytics-1-82
No ratings yet
Data Science and Big Data Analytics-1-82
82 pages
II Cse Cs3352 Fds QB Unit2
No ratings yet
II Cse Cs3352 Fds QB Unit2
5 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
ML 2
No ratings yet
ML 2
6 pages
Expert System and Apllications: Ai - Iii-Unit
No ratings yet
Expert System and Apllications: Ai - Iii-Unit
27 pages
ML Lesson Plan (21AI63)
No ratings yet
ML Lesson Plan (21AI63)
8 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Int. To Data Analytics and Cyber Security Syllabus
No ratings yet
Int. To Data Analytics and Cyber Security Syllabus
2 pages
Program Code & Name:P132::B.Tech. (Computer Science & Engineering) Year of Admission: 2019 Duration in Years: 4
No ratings yet
Program Code & Name:P132::B.Tech. (Computer Science & Engineering) Year of Admission: 2019 Duration in Years: 4
9 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
BDA Experiment 10
No ratings yet
BDA Experiment 10
9 pages
Csps 1
100% (1)
Csps 1
62 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
Lab 3 - Linear Regression
No ratings yet
Lab 3 - Linear Regression
15 pages
AD3491 - Unit 4 - Analysis of Variance Important Questions 2 Marks With Answer --3-9 (1)
No ratings yet
AD3491 - Unit 4 - Analysis of Variance Important Questions 2 Marks With Answer --3-9 (1)
7 pages
Lab Manual CL III
No ratings yet
Lab Manual CL III
66 pages
AD3301 - Model - Exam - Question Paper1
No ratings yet
AD3301 - Model - Exam - Question Paper1
2 pages
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
No ratings yet
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
1 page
ECE 5th Sem Syllabus
0% (1)
ECE 5th Sem Syllabus
84 pages
WEKA Manual For Version 3-6-5
No ratings yet
WEKA Manual For Version 3-6-5
303 pages
Image Processing - Notes
No ratings yet
Image Processing - Notes
239 pages
Unit II Notes
No ratings yet
Unit II Notes
36 pages
Revised CS8383 (Eee) Oop Lab Man
No ratings yet
Revised CS8383 (Eee) Oop Lab Man
85 pages
1.JAVA Practicals
No ratings yet
1.JAVA Practicals
33 pages
Rayleigh Model
No ratings yet
Rayleigh Model
9 pages
ML_LAB_Mannual-1
No ratings yet
ML_LAB_Mannual-1
79 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
Notes - EDA-Unit1 (2)
No ratings yet
Notes - EDA-Unit1 (2)
34 pages
Query Operation 2021
No ratings yet
Query Operation 2021
35 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
End-to-End Machine Learning Project (Bootcamp)
No ratings yet
End-to-End Machine Learning Project (Bootcamp)
415 pages
Distance Based Models
No ratings yet
Distance Based Models
58 pages
Disease Prediction Synopsis
No ratings yet
Disease Prediction Synopsis
3 pages
AIML Module - 03 21CS4
No ratings yet
AIML Module - 03 21CS4
34 pages
IV_AI-DS_AD3491_FDSA_Unit3
No ratings yet
IV_AI-DS_AD3491_FDSA_Unit3
35 pages
CS 224n Assignment #2: Word2vec (43 Points)
No ratings yet
CS 224n Assignment #2: Word2vec (43 Points)
4 pages
Chronic Kidney Disease Using CNN
100% (1)
Chronic Kidney Disease Using CNN
10 pages
30 Hrs Deep Learning CV Images Video
No ratings yet
30 Hrs Deep Learning CV Images Video
6 pages
Aiml Lab Manual 2023
No ratings yet
Aiml Lab Manual 2023
17 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
1 page
Beginning C# 3.0: An Introduction to Object Oriented Programming
From Everand
Beginning C# 3.0: An Introduction to Object Oriented Programming
Jack Purdum
No ratings yet
BAIL606-MLL
No ratings yet
BAIL606-MLL
3 pages
Module_1_part2_NLP
No ratings yet
Module_1_part2_NLP
79 pages
1IKS notes
No ratings yet
1IKS notes
16 pages
Module 1_HCAI
No ratings yet
Module 1_HCAI
61 pages
LAB Manual 4th Sem
No ratings yet
LAB Manual 4th Sem
11 pages
BRMK557-model-set-1-paper
No ratings yet
BRMK557-model-set-1-paper
2 pages
Module 3 Python (Chap 2)
No ratings yet
Module 3 Python (Chap 2)
13 pages
Finance Research Letters: Xianhang Qian, Shanyun Qiu, Guangli Zhang
No ratings yet
Finance Research Letters: Xianhang Qian, Shanyun Qiu, Guangli Zhang
7 pages
QP23DP2 - 132: Time: 3 Hours Total Marks: 100
No ratings yet
QP23DP2 - 132: Time: 3 Hours Total Marks: 100
2 pages
CGL Syllabus
No ratings yet
CGL Syllabus
7 pages
REG2022
No ratings yet
REG2022
313 pages
Binary Logistic Regression Mintab Tutorial
No ratings yet
Binary Logistic Regression Mintab Tutorial
4 pages
Introduction To The Practice Of Statistics 9th David S Moore George P Mccabe Bruce A Craig download
No ratings yet
Introduction To The Practice Of Statistics 9th David S Moore George P Mccabe Bruce A Craig download
87 pages
Econometrics Mock Exam - Solutions
No ratings yet
Econometrics Mock Exam - Solutions
3 pages
Journal of Agromedicine: To Cite This Article: Kendall Thu PHD, Paul Lasley PHD, Paul Whitten MS, Mary Lewis
No ratings yet
Journal of Agromedicine: To Cite This Article: Kendall Thu PHD, Paul Lasley PHD, Paul Whitten MS, Mary Lewis
13 pages
Course Project - Machine Learning (DS PGC)
No ratings yet
Course Project - Machine Learning (DS PGC)
6 pages
ch04 PDF
67% (3)
ch04 PDF
89 pages
StockWatson Econ CH04
No ratings yet
StockWatson Econ CH04
27 pages
Jones and Hensher 2004
No ratings yet
Jones and Hensher 2004
28 pages
Padeepz App AD3491 Syllabus
No ratings yet
Padeepz App AD3491 Syllabus
2 pages
Statistics For Business Decision Making
No ratings yet
Statistics For Business Decision Making
22 pages
La Porta Et Al 1997
No ratings yet
La Porta Et Al 1997
17 pages
Basic Psychological Needs On Primary School Student Well-Being
No ratings yet
Basic Psychological Needs On Primary School Student Well-Being
7 pages
22IZ023 Nikhil - Exercise 6_ Linear Regression
No ratings yet
22IZ023 Nikhil - Exercise 6_ Linear Regression
4 pages
101-Article Text-167-1-10-20200825
No ratings yet
101-Article Text-167-1-10-20200825
12 pages
2017 Financial Risk Manager FRM Learning
No ratings yet
2017 Financial Risk Manager FRM Learning
50 pages
Decision Analyst Sandy Baron Has Taken A Job With An Upand Coming Consulting Firm in San
No ratings yet
Decision Analyst Sandy Baron Has Taken A Job With An Upand Coming Consulting Firm in San
3 pages
Introductory Statistics for the Life and Biomedical Sciences 1st Edition Julie Vu all chapter instant download
100% (4)
Introductory Statistics for the Life and Biomedical Sciences 1st Edition Julie Vu all chapter instant download
40 pages
Week 9-Stat Data Processing
No ratings yet
Week 9-Stat Data Processing
23 pages
Capital Strucure and Its Impact On Financial Performance of Indian Steel Industry PDF
No ratings yet
Capital Strucure and Its Impact On Financial Performance of Indian Steel Industry PDF
10 pages
Skripsi Hilmi Azizi
No ratings yet
Skripsi Hilmi Azizi
93 pages
Casio FX 82ZA PLUS Cheatsheet
No ratings yet
Casio FX 82ZA PLUS Cheatsheet
11 pages
Time Series and Panel Data Econometrics
No ratings yet
Time Series and Panel Data Econometrics
95 pages
Chapter 6. Correlation vs. Causality in Regression Analysis
No ratings yet
Chapter 6. Correlation vs. Causality in Regression Analysis
27 pages
2 Simple Regression Model
No ratings yet
2 Simple Regression Model
55 pages
PPT09 - Simple Linear Regression and Correlation
No ratings yet
PPT09 - Simple Linear Regression and Correlation
53 pages

aiml manual 6th sem

Uploaded by

aiml manual 6th sem

Uploaded by

Artificial Intelligence & Machine Learning

Machine Learning lab (BAIL606)

Machine Learning lab Semester 6

Course outcomes (Course Skill Set):

Continuous Internal Evaluation (CIE):

1. S Sridhar and M Vijayalakshmi, “Machine Learning”, Oxford University Press, 2021.

Web links and Video Lectures (e-Resources):

import seaborn as sns

import matplotlib.pyplot as plt

# Import necessary libraries

You might also like