0% found this document useful (0 votes)

4 views

ml observation

The document outlines several machine learning programs using Python, including creating histograms and box plots for the California Housing dataset, computing and visualizing a correlation matrix, implementing PCA on the Iris dataset, and applying the Find-S algorithm and k-Nearest Neighbour algorithm on generated data. Each program includes code snippets and outputs demonstrating the analysis and classification tasks performed. The document serves as a laboratory guide for machine learning techniques and data visualization.

Uploaded by

Ff Veera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

ml observation

Uploaded by

Ff Veera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Machine Learning Laboratory BCSL606

Program 1: Develop a program to create histograms for all numerical

features and analyze the distribution of each feature.
Generate box plots for all numerical features and identify any outliers.
Use California Housing dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset

housing_data = fetch_california_housing(as_frame=True)
data = housing_data['data']
print(data)
data['MedHouseVal'] = housing_data['target'] # Adding target variable for
completeness

# Histograms for all numerical features

print("Creating histograms for all numerical features...")
for column in data.columns:
plt.figure(figsize=(8, 5))
plt.hist(data[column], bins=50, edgecolor='k', alpha=0.7)
plt.title(f'Distribution of {column}')
plt.xlabel(column)
plt.ylabel('Frequency')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

# Box plots for all numerical features

print("Creating box plots for all numerical features to identify outliers...")
for column in data.columns:
plt.figure(figsize=(8, 5))

Dept. of ISE, JSSATEB 2024-25 1

Machine Learning Laboratory BCSL606

plt.boxplot(data[column], vert=False, patch_artist=True,

boxprops=dict(facecolor='skyblue', color='blue'))
plt.title(f'Box Plot of {column}')
plt.xlabel(column)
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

# Identify outliers using IQR

print("Identifying potential outliers using the IQR method...")
outliers = {}
for column in data.columns:
Q1 = data[column].quantile(0.25)
Q3 = data[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers[column] = data[(data[column] < lower_bound) | (data[column] >
upper_bound)]
print(f"{column}:")
print(f"Lower Bound: {lower_bound}, Upper Bound: {upper_bound}")
print(f"Number of outliers: {len(outliers[column])}")
print("---")

Output:
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude \
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85
3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85
4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85
... ... ... ... ... ... ... ...
20635 1.5603 25.0 5.045455 1.133333 845.0 2.560606 39.48
20636 2.5568 18.0 6.114035 1.315789 356.0 3.122807 39.49
20637 1.7000 17.0 5.205543 1.120092 1007.0 2.325635 39.43
20638 1.8672 18.0 5.329513 1.171920 741.0 2.123209 39.43
20639 2.3886 16.0 5.254717 1.162264 1387.0 2.616981 39.37

Dept. of ISE, JSSATEB 2024-25 2

Machine Learning Laboratory BCSL606
Longitude
0 -122.23
1 -122.22
2 -122.24
3 -122.25
4 -122.25
... ...
20635 -121.09
20636 -121.21
20637 -121.22
20638 -121.32
20639 -121.24

[20640 rows x 8 columns]

Creating histograms for all numerical features...

Identifying potential outliers using the IQR method...

MedInc:
Lower Bound: -0.7063750000000004, Upper Bound: 8.013024999999999
Number of outliers: 681
---
HouseAge:
Lower Bound: -10.5, Upper Bound: 65.5
Number of outliers: 0
---
AveRooms:
Lower Bound: 2.023219161170969, Upper Bound: 8.469878027106942
Number of outliers: 511
---

Dept. of ISE, JSSATEB 2024-25 3

Machine Learning Laboratory BCSL606

AveBedrms:
Lower Bound: 0.8659085155701288, Upper Bound: 1.2396965968190603
Number of outliers: 1424
---
Population:
Lower Bound: -620.0, Upper Bound: 3132.0
Number of outliers: 1196
---
AveOccup:
Lower Bound: 1.1509614824735064, Upper Bound: 4.5610405893536905
Number of outliers: 711
---
Latitude:
Lower Bound: 28.259999999999998, Upper Bound: 43.38
Number of outliers: 0
---
Longitude:
Lower Bound: -127.48499999999999, Upper Bound: -112.32500000000002
Number of outliers: 0
---
MedHouseVal:
Lower Bound: -0.9808749999999995, Upper Bound: 4.824124999999999
Number of outliers: 1071
---

Dept. of ISE, JSSATEB 2024-25 4

Machine Learning Laboratory BCSL606

Program 2: Develop a program to Compute the correlation matrix to

understand the relationships between pairs of features. Visualize the
correlation matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise
relationships between features. Use California Housing dataset.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# Load the California Housing dataset

housing_data = fetch_california_housing(as_frame=True)
data = housing_data['data']
data['MedHouseVal'] = housing_data['target'] # Adding target variable for
completeness

# Compute the correlation matrix

print("Computing the correlation matrix...")
correlation_matrix = data.corr()
print(correlation_matrix)

# Visualize the correlation matrix using a heatmap

print("Visualizing the correlation matrix using a heatmap...")
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, fmt=".2f", cmap="coolwarm",
cbar=True, square=True)
plt.title("Correlation Matrix Heatmap")
plt.show()

Dept. of ISE, JSSATEB 2024-25 5

Machine Learning Laboratory BCSL606

# Create a pair plot to visualize pairwise relationships between features

print("Creating a pair plot to visualize pairwise relationships between
features...")
sns.pairplot(data, diag_kind='kde', corner=True)
plt.show()
Output:
Computing the correlation matrix...
MedInc HouseAge AveRooms AveBedrms Population AveOccup \
MedInc 1.000000 -0.119034 0.326895 -0.062040 0.004834 0.018766
HouseAge -0.119034 1.000000 -0.153277 -0.077747 -0.296244 0.013191
AveRooms 0.326895 -0.153277 1.000000 0.847621 -0.072213 -0.004852
AveBedrms -0.062040 -0.077747 0.847621 1.000000 -0.066197 -0.006181
Population 0.004834 -0.296244 -0.072213 -0.066197 1.000000 0.069863
AveOccup 0.018766 0.013191 -0.004852 -0.006181 0.069863 1.000000
Latitude -0.079809 0.011173 0.106389 0.069721 -0.108785 0.002366
Longitude -0.015176 -0.108197 -0.027540 0.013344 0.099773 0.002476
MedHouseVal 0.688075 0.105623 0.151948 -0.046701 -0.024650 -0.023737

Latitude Longitude MedHouseVal

MedInc -0.079809 -0.015176 0.688075
HouseAge 0.011173 -0.108197 0.105623
AveRooms 0.106389 -0.027540 0.151948
AveBedrms 0.069721 0.013344 -0.046701
Population -0.108785 0.099773 -0.024650
AveOccup 0.002366 0.002476 -0.023737
Latitude 1.000000 -0.924664 -0.144160
Longitude -0.924664 1.000000 -0.045967
MedHouseVal -0.144160 -0.045967 1.000000
Visualizing the correlation matrix using a heatmap...

Dept. of ISE, JSSATEB 2024-25 6

Machine Learning Laboratory BCSL606

Dept. of ISE, JSSATEB 2024-25 7

Machine Learning Laboratory BCSL606

Program 3: Develop a program to implement Principal Component

Analysis (PCA) for reducing the dimensionality of the Iris dataset from 4
features to 2.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from numpy.linalg import eig

# Load the Iris dataset

iris = load_iris()
iris_data = iris.data
iris_target = iris.target
iris_feature_names = iris.feature_names

# Convert to DataFrame
df = pd.DataFrame(iris_data, columns=iris_feature_names)
df['Target'] = iris_target

# Example Data (First 5 Samples for Explanation)

example_data = iris_data[:5]
print("Example Data (First 5 Samples):")
print(example_data)

# Step 1: Standardize the Data

scaler = StandardScaler()
iris_data_scaled = scaler.fit_transform(iris_data)
example_data_scaled = scaler.transform(example_data)
print("\nStandardized Example Data:")
print(example_data_scaled)

Dept. of ISE, JSSATEB 2024-25 8

Machine Learning Laboratory BCSL606

# Step 2: Compute Covariance Matrix Manually

n_samples = iris_data_scaled.shape[0]
mean_vector = np.mean(iris_data_scaled, axis=0)
X_centered = iris_data_scaled - mean_vector
cov_matrix_manual = (1 / (n_samples - 1)) * np.dot(X_centered.T,
X_centered)
print("\nManually Computed Covariance Matrix:")
print(cov_matrix_manual)

# Step 3: Compute Eigenvalues and Eigenvectors Manually

eigenvalues_manual, eigenvectors_manual = eig(cov_matrix_manual)
print("\nManually Computed Eigenvalues:")
print(eigenvalues_manual)
print("\nManually Computed Eigenvectors:")
print(eigenvectors_manual)

# Step 4: Select Top 2 Principal Components

sorted_indices = np.argsort(eigenvalues_manual)[::-1]
top_2_indices = sorted_indices[:2]
top_2_eigenvectors = eigenvectors_manual[:, top_2_indices]
print("\nTop 2 Eigenvectors:")
print(top_2_eigenvectors)

# Step 5: Transform Data to 2D

iris_pca = np.dot(iris_data_scaled, top_2_eigenvectors)
example_pca = np.dot(example_data_scaled, top_2_eigenvectors)
print("\nReduced 2D Example Data:")
print(example_pca)

# Step 6: Visualize PCA Results

iris_pca_df = pd.DataFrame(data=iris_pca, columns=["Principal Component
1", "Principal Component 2"])
iris_pca_df['Target'] = iris_target

Dept. of ISE, JSSATEB 2024-25 9

Machine Learning Laboratory BCSL606

plt.figure(figsize=(8, 6))
sns.scatterplot(
x="Principal Component 1", y="Principal Component 2", hue="Target",
data=iris_pca_df,
palette="viridis", s=100, alpha=0.8
)
plt.title("PCA of Iris Dataset")
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.legend(title="Target", labels=iris.target_names)
plt.grid(alpha=0.5)
plt.show()
Output:

Dept. of ISE, JSSATEB 2024-25 10

Machine Learning Laboratory BCSL606

Program 4: For a given set of training data examples stored in a .CSV

file, implement and demonstrate the Find-S algorithm to output a
description of the set of all hypotheses consistent with the training
examples.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Implement Find-S algorithm
print("Implementing Find-S algorithm...")
# Implement Find-S algorithm
def find_s_algorithm(csv_file):
# Load the dataset
dataset = pd.read_csv(csv_file)
attributes = dataset.iloc[:, :-1].values
labels = dataset.iloc[:, -1].values

for i, label in enumerate(labels):

if label == 'Yes': # First positive example found
hypothesis = list(attributes[i])
break # Stop after finding the first "Yes"

for i in range(len(labels)):
if labels[i] == 'Yes': # Only process positive examples
for j in range(len(hypothesis)):
if hypothesis[j] != attributes[i][j]:
hypothesis[j] = '?' # Generalize

return hypothesis

csv_file = "/content/find_s_example.csv" # Provide the path to your CSV file

final_hypothesis = find_s_algorithm(csv_file)
print("Final Hypothesis:", final_hypothesis)

Output:
Implementing Find-S algorithm...
Final Hypothesis: ['Sunny', 'Warm', '?', '?', '?', '?']

Dept. of ISE, JSSATEB 2024-25 11

Machine Learning Laboratory BCSL606

Program 5: Develop a program to implement k-Nearest Neighbour

algorithm to classify the randomly generated 100 values of x in the range
of [0,1]. Perform the following based on dataset generated.
1. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ε
Class1, else xi ε Class1
2. Classify the remaining points, x51,……,x100 using KNN. Perform this
for k=1,2,3,4,5,20,30

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

# Step 1: Generate 100 random values in the range [0,1]

np.random.seed(42) # For reproducibility
x = np.random.rand(100).reshape(-1, 1) # Reshape for sklearn compatibility

print(x[:5])

# Step 2: Label the first 50 points

labels = np.array([1 if xi <= 0.5 else 2 for xi in x[:50]]) # Class 1 if xi <= 0.5
else Class 2

# Step 3: Train KNN classifier

k_values = [1, 2, 3, 4, 5, 20, 30]
classified_labels = {}

for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(x[:50], labels) # Train using first 50 points
classified_labels[k] = knn.predict(x[50:]) # Classify remaining 50 points

# Step 4: Visualize the results

plt.figure(figsize=(10, 6))
plt.scatter(x[:50], labels, color='blue', label='Training Data')

Dept. of ISE, JSSATEB 2024-25 12

Machine Learning Laboratory BCSL606

plt.scatter(x[50:], classified_labels[1], color='red', marker='x', label='Classified

Data (k=1)')
plt.xlabel('X values')
plt.ylabel('Class')
plt.title('KNN Classification of Random Values')
plt.legend()
plt.show()

# Print classification results for different k values

for k in k_values:
print(f"Classification results for k={k}: {classified_labels[k]}")

Output:

Classification results for k=1: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2

2221112211112
2 2 1 1 2 2 2 2 1 2 1 1 1]
Classification results for k=2: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2
2221112211112
2 2 1 1 2 2 2 2 1 2 1 1 1]
Classification results for k=3: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2
2221112211112
2 2 1 1 2 2 2 2 2 2 1 1 1]

Dept. of ISE, JSSATEB 2024-25 13

Machine Learning Laboratory BCSL606

Classification results for k=4: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2

2221112211112
2 2 1 1 2 2 2 2 2 2 1 1 1]
Classification results for k=5: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1 2
2221112211112
2 2 1 1 2 2 2 2 2 2 1 1 1]
Classification results for k=20: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1
22221112211112
2 2 1 1 2 2 2 2 2 2 1 1 1]
Classification results for k=30: [2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 1
22221112211112
2 2 1 1 2 2 2 2 1 2 1 1 1]

Dept. of ISE, JSSATEB 2024-25 14

Machine Learning Laboratory BCSL606

Program 6: Implement the non-parametric Locally Weighted Regression

algorithm in order to fit data points. Select appropriate data set for your
experiment and draw graphs

import numpy as np
import matplotlib.pyplot as plt

def gaussian_kernel(x, x_query, tau):

"""Compute the Gaussian weight for each training sample."""
return np.exp(-np.square(x - x_query) / (2 * tau**2))

def locally_weighted_regression(x_train, y_train, x_query, tau):

"""Perform Locally Weighted Regression (LWR) for a given query point."""
m = len(x_train)
W = np.diag(gaussian_kernel(x_train, x_query, tau)) # Compute weights

X_bias = np.c_[np.ones(m), x_train] # Add bias term

theta = np.linalg.pinv(X_bias.T @ W @ X_bias) @ (X_bias.T @ W @ y_train)

return np.array([1, x_query]) @ theta # Predict output for x_query

# Generate synthetic dataset

np.random.seed(42)
x_train = np.linspace(0, 10, 100)
y_train = np.sin(x_train) + np.random.normal(0, 0.2, 100) # Sinusoidal data
with noise

# Define tau (bandwidth parameter)

tau_values = [0.1, 0.5, 1, 5]
x_test = np.linspace(0, 10, 100) # Test data

plt.figure(figsize=(12, 8))
for tau in tau_values:
y_pred = np.array([locally_weighted_regression(x_train, y_train, xq, tau)
for xq in x_test])
plt.plot(x_test, y_pred, label=f'tau={tau}')

# Plot training data

plt.scatter(x_train, y_train, color='black', label='Training Data', alpha=0.5)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Locally Weighted Regression (LWR) with Different Tau Values')
plt.legend()
plt.show()

Dept. of ISE, JSSATEB 2024-25 15

Machine Learning Laboratory BCSL606

Output:

Dept. of ISE, JSSATEB 2024-25 16

Machine Learning Laboratory BCSL606

Program 7: Develop a program to demonstrate the working of Linear

Regression and Polynomial Regression. Use Boston Housing Dataset for
Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency
prediction) for Polynomial Regression.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import make_pipeline

# Load Boston Housing Dataset from CSV

boston_df = pd.read_csv('/content/Boston.csv')
print("Boston CSV Columns:", boston_df.columns)
X_boston = boston_df[['rm']].values
y_boston = boston_df['medv'].values

X_train, X_test, y_train, y_test = train_test_split(X_boston, y_boston,

test_size=0.2, random_state=42)

linear_reg = LinearRegression()
linear_reg.fit(X_train, y_train)
y_pred = linear_reg.predict(X_test)

# Plot results
plt.figure(figsize=(10, 5))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('Housing Price')

Dept. of ISE, JSSATEB 2024-25 17

Machine Learning Laboratory BCSL606

plt.title('Linear Regression on Boston Housing Dataset')

plt.legend()
plt.show()

print(f"Mean Squared Error (Linear Regression): {mean_squared_error(y_test,

y_pred)}")

# Polynomial Regression on Auto MPG Dataset

auto_mpg_url = "https://archive.ics.uci.edu/ml/machine-learning-
databases/auto-mpg/auto-mpg.data"
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
'acceleration', 'model_year', 'origin']

auto_df = pd.read_csv(auto_mpg_url, delim_whitespace=True,

names=column_names, na_values='?')
auto_df = auto_df.dropna() # Remove rows with missing values

X_auto = auto_df[['horsepower']].astype(float).values # Using 'horsepower' as

feature
y_auto = auto_df['mpg'].values

X_train, X_test, y_train, y_test = train_test_split(X_auto, y_auto,

test_size=0.2, random_state=42)

# Polynomial Regression (degree=3)

poly_model = make_pipeline(PolynomialFeatures(degree=3),
StandardScaler(), LinearRegression())
poly_model.fit(X_train, y_train)
y_poly_pred = poly_model.predict(X_test)

# Plot results
X_test_sorted, y_poly_pred_sorted = zip(*sorted(zip(X_test.flatten(),
y_poly_pred)))

Dept. of ISE, JSSATEB 2024-25 18

Machine Learning Laboratory BCSL606

plt.figure(figsize=(10, 5))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test_sorted, y_poly_pred_sorted, color='red', linewidth=2,
label='Predicted')
plt.xlabel('Horsepower')
plt.ylabel('MPG')
plt.title('Polynomial Regression on Auto MPG Dataset')
plt.legend()
plt.show()

print(f"Mean Squared Error (Polynomial Regression):

{mean_squared_error(y_test, y_poly_pred)}")

Output:

Mean Squared Error (Linear Regression): 46.144775347317264

Dept. of ISE, JSSATEB 2024-25 19

Machine Learning Laboratory BCSL606

Dept. of ISE, JSSATEB 2024-25 20

Machine Learning Laboratory BCSL606

8. Develop a program to demonstrate the working of the decision tree

algorithm. Use Breast Cancer Data set for building the decision tree and
apply this knowledge to classify a new sample.

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
import matplotlib.pyplot as plt
from collections import Counter

data = load_breast_cancer()
X = data.data
y = data.target
feature_names = data.feature_names
target_names = data.target_names

print("Feature names:", feature_names)

print("Target names:", target_names)

def calculate_entropy(labels):
total = len(labels)
counts = Counter(labels)
entropy = 0.0
for count in counts.values():
p = count / total
entropy -= p * np.log2(p)
return entropy

entropy_dataset = calculate_entropy(y)
print(f"\nOverall Entropy of Target (Malignant vs Benign):
{entropy_dataset:.4f}")

print("\nInformation Gain for Each Feature (using median split):")

for i, feature in enumerate(feature_names):
feature_values = X[:, i]
median_value = np.median(feature_values)

# Split dataset
left_mask = feature_values <= median_value
right_mask = feature_values > median_value

y_left = y[left_mask]

Dept. of ISE, JSSATEB 2024-25 21

Machine Learning Laboratory BCSL606

y_right = y[right_mask]

entropy_left = calculate_entropy(y_left)
entropy_right = calculate_entropy(y_right)

weighted_entropy = (len(y_left) / len(y)) * entropy_left + (len(y_right) /

len(y)) * entropy_right
info_gain = entropy_dataset - weighted_entropy

print(f"{feature}: IG = {info_gain:.4f}")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

clf = DecisionTreeClassifier(criterion='entropy', max_depth=4,

random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))

plt.figure(figsize=(20, 10))
plot_tree(clf, feature_names=feature_names, class_names=target_names,
filled=True, rounded=True)
plt.title("Decision Tree Visualization for Breast Cancer Dataset")
plt.show()

new_sample = np.array([[17.99, 10.38, 122.8, 1001.0, 0.1184,

0.2776, 0.3001, 0.1471, 0.2419, 0.07871,
1.095, 0.9053, 8.589, 153.4, 0.006399,
0.04904, 0.05373, 0.01587, 0.03003, 0.006193,
25.38, 17.33, 184.6, 2019.0, 0.1622,
0.6656, 0.7119, 0.2654, 0.4601, 0.1189]])

prediction = clf.predict(new_sample)
print("\nPrediction for new sample:")
print("Class:", target_names[prediction[0]])

Dept. of ISE, JSSATEB 2024-25 22

Machine Learning Laboratory BCSL606

Output:
Feature names: ['mean radius' 'mean texture' 'mean perimeter'
'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal
dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error' 'fractal dimension
error'
'worst radius' 'worst texture' 'worst perimeter' 'worst area'
'worst smoothness' 'worst compactness' 'worst concavity'
'worst concave points' 'worst symmetry' 'worst fractal
dimension']
Target names: ['malignant' 'benign']

Overall Entropy of Target (Malignant vs Benign): 0.9526

Information Gain for Each Feature (using median split):

mean radius: IG = 0.3416
mean texture: IG = 0.1445
mean perimeter: IG = 0.3507
mean area: IG = 0.3416
mean smoothness: IG = 0.0660
mean compactness: IG = 0.2325
mean concavity: IG = 0.3695
mean concave points: IG = 0.3995
mean symmetry: IG = 0.0627
mean fractal dimension: IG = 0.0000
radius error: IG = 0.1824
texture error: IG = 0.0000
perimeter error: IG = 0.2192
area error: IG = 0.2910
smoothness error: IG = 0.0023
compactness error: IG = 0.0990
concavity error: IG = 0.1601
concave points error: IG = 0.1445
symmetry error: IG = 0.0037
fractal dimension error: IG = 0.0284
worst radius: IG = 0.4588
worst texture: IG = 0.1298
worst perimeter: IG = 0.4436
worst area: IG = 0.4556
worst smoothness: IG = 0.0990
worst compactness: IG = 0.1882
worst concavity: IG = 0.3792
worst concave points: IG = 0.4209
worst symmetry: IG = 0.0762
worst fractal dimension: IG = 0.0452

Dept. of ISE, JSSATEB 2024-25 23

Machine Learning Laboratory BCSL606

Classification Report:
precision recall f1-score support

0 0.97 0.91 0.94 43

1 0.95 0.99 0.97 71

accuracy 0.96 114

macro avg 0.96 0.95 0.95 114
weighted avg 0.96 0.96 0.96 114

Accuracy: 0.956140350877193

Prediction for new sample:

Class: malignant

Dept. of ISE, JSSATEB 2024-25 24

Machine Learning Laboratory BCSL606

9. Develop a program to implement the Naive Bayesian classifier

considering Olivetti Face Data set for training. Compute the accuracy of
the classifier, considering a few test data sets.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix

faces = fetch_olivetti_faces()
X = faces.data # Flattened images: 400 x 4096
y = faces.target # Labels: 0 to 39 (40 classes)
images = faces.images # Original image shapes: 64 x 64

print(f"Total samples: {X.shape[0]}")

print(f"Image shape: {images[0].shape}")
print(f"Total classes: {len(np.unique(y))}")

X_train, X_test, y_train, y_test, img_train, img_test = train_test_split(

X, y, images, test_size=0.3, random_state=42, stratify=y)

model = GaussianNB()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy:", accuracy)

def show_predictions(images, true_labels, predicted_labels, n=8):

plt.figure(figsize=(15, 5))
for i in range(n):
plt.subplot(1, n, i + 1)
plt.imshow(images[i], cmap='gray')
plt.title(f"True: {true_labels[i]}\nPred: {predicted_labels[i]}")
plt.axis('off')
plt.tight_layout()
plt.suptitle("Sample Test Predictions", fontsize=16)
plt.subplots_adjust(top=0.75)
plt.show()

show_predictions(img_test, y_test, y_pred, n=8)

Dept. of ISE, JSSATEB 2024-25 25

Machine Learning Laboratory BCSL606

Output:
Classification Report:
precision recall f1-score support

0 1.00 0.67 0.80 3

1 1.00 0.67 0.80 3
2 0.43 1.00 0.60 3
3 1.00 0.33 0.50 3
4 1.00 0.33 0.50 3
5 1.00 1.00 1.00 3
6 1.00 0.67 0.80 3
7 0.60 1.00 0.75 3
8 1.00 1.00 1.00 3
9 1.00 0.33 0.50 3
10 1.00 0.67 0.80 3
11 1.00 1.00 1.00 3
12 1.00 1.00 1.00 3
13 1.00 0.67 0.80 3
14 1.00 1.00 1.00 3
15 0.50 1.00 0.67 3
16 1.00 0.33 0.50 3
17 0.00 0.00 0.00 3
18 1.00 1.00 1.00 3
19 1.00 1.00 1.00 3
20 1.00 1.00 1.00 3
21 1.00 1.00 1.00 3
22 1.00 1.00 1.00 3
23 1.00 1.00 1.00 3
24 1.00 0.67 0.80 3
25 0.75 1.00 0.86 3
26 1.00 0.67 0.80 3
27 1.00 1.00 1.00 3
28 1.00 1.00 1.00 3
29 1.00 1.00 1.00 3
30 0.75 1.00 0.86 3
31 1.00 0.67 0.80 3
32 1.00 1.00 1.00 3
33 1.00 0.67 0.80 3
34 0.43 1.00 0.60 3
35 0.75 1.00 0.86 3
36 1.00 1.00 1.00 3
37 1.00 0.33 0.50 3
38 1.00 1.00 1.00 3
39 0.33 1.00 0.50 3

accuracy 0.82 120

macro avg 0.89 0.82 0.81 120

Dept. of ISE, JSSATEB 2024-25 26

Machine Learning Laboratory BCSL606

weighted avg 0.89 0.82 0.81 120

Accuracy: 0.8166666666666667

Dept. of ISE, JSSATEB 2024-25 27

Machine Learning Laboratory BCSL606

10. Develop a program to implement k-means clustering using

Wisconsin Breast Cancer data set and visualize the clustering result.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()
X = data.data
y = data.target
feature_names = data.feature_names
target_names = data.target_names

print("Data Shape:", X.shape)

print("Classes:", target_names)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)

clusters = kmeans.fit_predict(X_scaled)

labels_mapped = np.where(clusters == 1, 0, 1)

print("\nConfusion Matrix:")
print(confusion_matrix(y, labels_mapped))
print("Accuracy:", accuracy_score(y, labels_mapped))

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

plt.figure(figsize=(10, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=clusters, cmap='viridis', alpha=0.6)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
s=250, marker='X', c='red', label='Centroids')
plt.title("K-Means Clustering of Breast Cancer Dataset (PCA-2D)")
plt.xlabel("PCA Component 1")
plt.ylabel("PCA Component 2")
plt.legend()
plt.grid(True)
plt.show()

Dept. of ISE, JSSATEB 2024-25 28

Machine Learning Laboratory BCSL606

Output:

Data Shape: (569, 30)

Classes: ['malignant' 'benign']

Confusion Matrix:
[[176 36]
[ 18 339]]
Accuracy: 0.9050966608084359

Dept. of ISE, JSSATEB 2024-25 29

Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
60 pages
ML lab manual
No ratings yet
ML lab manual
25 pages
lab manual ML.docx
No ratings yet
lab manual ML.docx
26 pages
Machine Learning Lab Manual (1)
No ratings yet
Machine Learning Lab Manual (1)
33 pages
Exp_2-EDA_CaliforniaData Set_HeatMap_PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp_2-EDA_CaliforniaData Set_HeatMap_PairPlot-checkpoint - Jupyter Notebook
12 pages
ML Program No.1
No ratings yet
ML Program No.1
3 pages
Machine Learning Lab Manual (1) (1)
No ratings yet
Machine Learning Lab Manual (1) (1)
26 pages
Exp1a
No ratings yet
Exp1a
5 pages
ML Lab program 1& 2
No ratings yet
ML Lab program 1& 2
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
40 pages
ML LAB - BCSL606
No ratings yet
ML LAB - BCSL606
67 pages
Machine Learning(BCSL606) Lab Manual
No ratings yet
Machine Learning(BCSL606) Lab Manual
117 pages
Machine Learning(BCSL606) Lab Manual (2) (1)
No ratings yet
Machine Learning(BCSL606) Lab Manual (2) (1)
117 pages
Ml Manual
No ratings yet
Ml Manual
30 pages
ML 1st Program
No ratings yet
ML 1st Program
3 pages
ML-LAB-Manual
No ratings yet
ML-LAB-Manual
18 pages
pgm1
No ratings yet
pgm1
5 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
ml_labmanual (3)
No ratings yet
ml_labmanual (3)
33 pages
ML-3
No ratings yet
ML-3
24 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
M pdf
No ratings yet
M pdf
13 pages
Lab Prog1
No ratings yet
Lab Prog1
2 pages
mlalllabprgs
No ratings yet
mlalllabprgs
17 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
Copy of Project 4 _ House Price Prediction.ipynb - Colab
No ratings yet
Copy of Project 4 _ House Price Prediction.ipynb - Colab
5 pages
ML spy programs
No ratings yet
ML spy programs
16 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
9 pages
Faisal Nadeem (SAP# 30601)
No ratings yet
Faisal Nadeem (SAP# 30601)
7 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Bcsl606_lab Manual (1)
No ratings yet
Bcsl606_lab Manual (1)
28 pages
BCSL606 MACHINE LEARNING LAB
No ratings yet
BCSL606 MACHINE LEARNING LAB
33 pages
Boston House Prediction - Colab1
No ratings yet
Boston House Prediction - Colab1
10 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
ml lab
No ratings yet
ml lab
14 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Xgboost
No ratings yet
Xgboost
12 pages
saurabh
No ratings yet
saurabh
22 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
mlprogram1
No ratings yet
mlprogram1
3 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
30 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
No ratings yet
BCSL606 MACHINE LEARNING LAB FINAL DRAFT
32 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
20MIS1025 - Regression - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Regression - Ipynb - Colaboratory
5 pages
R Programs 2024-2025
No ratings yet
R Programs 2024-2025
13 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
Assignment4
No ratings yet
Assignment4
7 pages
Module 2
No ratings yet
Module 2
20 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Interactive and Dynamic Graphics For Data Analysis
No ratings yet
Interactive and Dynamic Graphics For Data Analysis
169 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
43 pages
fds qb
No ratings yet
fds qb
6 pages
Import As Import As From Import: "Mean Squared Errors: "
No ratings yet
Import As Import As From Import: "Mean Squared Errors: "
1 page
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
ml manual
No ratings yet
ml manual
9 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Data Science Project Training Report
No ratings yet
Data Science Project Training Report
19 pages
Module2 Ids 240201 162026
No ratings yet
Module2 Ids 240201 162026
11 pages
Machine Learning (Se204A) Lab Manual
No ratings yet
Machine Learning (Se204A) Lab Manual
27 pages
Machine Learning K Means - Unsupervised
No ratings yet
Machine Learning K Means - Unsupervised
5 pages
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
No ratings yet
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
12 pages
2015 - Hosseini, Shabani - New Approach To Customer Segmentation Based On Changes in Customer Value - Journal of Marketing Analytics
No ratings yet
2015 - Hosseini, Shabani - New Approach To Customer Segmentation Based On Changes in Customer Value - Journal of Marketing Analytics
12 pages
Multi Document Summarization Research Paper 1
No ratings yet
Multi Document Summarization Research Paper 1
26 pages
3_GGI 3203_Classification of Mixed Pixels_updated
No ratings yet
3_GGI 3203_Classification of Mixed Pixels_updated
15 pages
Electronics 09 01295 v2
No ratings yet
Electronics 09 01295 v2
12 pages
Lab1-Algorithms For Information Retrieval. Introduction
No ratings yet
Lab1-Algorithms For Information Retrieval. Introduction
13 pages
Analysis and Optimization of Data Classification Using K-Means Clustering and Affinity Propagation Technique
No ratings yet
Analysis and Optimization of Data Classification Using K-Means Clustering and Affinity Propagation Technique
9 pages
Kmeans Worksheet
No ratings yet
Kmeans Worksheet
6 pages
Contour and Texture Analysis For Image Segmentation
No ratings yet
Contour and Texture Analysis For Image Segmentation
21 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Unit 2
No ratings yet
Unit 2
57 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
48 pages
DMDW Case Study Finished
No ratings yet
DMDW Case Study Finished
28 pages
Mlsec Solution Exercise Sheet 7
No ratings yet
Mlsec Solution Exercise Sheet 7
6 pages
Handbook of Research on Machine Learning Applications and Trends Algorithms Methods and Techniques 1st Edition Emilio Soria Olivas pdf download
No ratings yet
Handbook of Research on Machine Learning Applications and Trends Algorithms Methods and Techniques 1st Edition Emilio Soria Olivas pdf download
52 pages
DWM TE QP
No ratings yet
DWM TE QP
7 pages
MACHINE LEARNING TUTORIAL QUESTION BANK modified
No ratings yet
MACHINE LEARNING TUTORIAL QUESTION BANK modified
13 pages
Introduction To Data Analytics and Visualization Question Paper
100% (1)
Introduction To Data Analytics and Visualization Question Paper
2 pages
Coincent Data Analysis Answers
No ratings yet
Coincent Data Analysis Answers
16 pages
Data Warehousing MCQ
No ratings yet
Data Warehousing MCQ
71 pages
9 Unsupervised Learning: 9.1 K-Means Clustering
No ratings yet
9 Unsupervised Learning: 9.1 K-Means Clustering
34 pages
Resume - ArchanaBalasubramanian - Assistant Professor - CSE - NGPIT - Coimbatore - 10X
No ratings yet
Resume - ArchanaBalasubramanian - Assistant Professor - CSE - NGPIT - Coimbatore - 10X
3 pages
Botschen Thelen Pieters - Using Means-End Structures For Benefit Segmentation
No ratings yet
Botschen Thelen Pieters - Using Means-End Structures For Benefit Segmentation
21 pages
Customer Segmentation Using K Means Clustering IJERTV11IS030152
No ratings yet
Customer Segmentation Using K Means Clustering IJERTV11IS030152
6 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages