0% found this document useful (0 votes)

15 views

ML Lab File[1]

Uploaded by

bhavyankarun1504

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

ML Lab File[1]

Uploaded by

bhavyankarun1504

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

INDIRA GANDHI DELHI TECHNICAL

UNIVERSITY FOR WOMEN

MACHINE LEARNING LAB

Tulika Arun
MTech. CSE-AI
(Batch of 2026)
Experiment 1
Aim :
Write a program to implement various feature selection techniques and compare the
performance with a classifier.

Dataset :
Breast Cancer Wisconsin Dataset
The Breast Cancer Wisconsin dataset is a popular dataset used for binary classification tasks
in machine learning. It is commonly used for research and educational purposes, particularly
in the context of medical diagnostics, as it involves determining whether a breast cancer
tumor is benign or malignant based on a set of features computed from a digitized image of
a fine needle aspirate (FNA) of a breast mass.
Dataset Composition
• Number of Instances: 569
• Number of Attributes (Features): 30 numeric features (floating-point) and 1 target
attribute.
• Target Attribute:
o 0 for benign tumors.
o 1 for malignant tumors.
Here is a detailed list of the features:
1. Radius: Mean of distances from the center to points on the perimeter.
2. Texture: Standard deviation of gray-scale values.
3. Perimeter: The total distance around the boundary of the nucleus.
4. Area: The area within the boundary of the nucleus.
5. Smoothness: Local variation in radius lengths.
6. Compactness: Perimeter² / area - 1.0.
7. Concavity: Severity of concave portions of the contour.
8. Concave Points: Number of concave portions of the contour.
9. Symmetry: Symmetry of the nucleus.
10. Fractal Dimension: "Coastline approximation" - 1 (roughness).
Each of these 10 basic features is computed in three different ways (mean, standard error,
and "worst" or largest). Thus, there are 30 features in total.
Dataset Format
The dataset is usually presented in a tabular format with the following columns:
• ID Number: Unique identifier for each instance.
• Diagnosis: Target class (M = malignant, B = benign).
• 30 Feature Columns: As described above.

Algorithm :
Feature selection (also known as variable selection, attribute selection or subset selection) is
the process of selecting a subset of relevant features to use in machine learning model
building. It is one of the core concepts in machine learning which has a huge impact on
models’ performance. Given a pool of features, the process will select the best subset of
attributes that are most important and have high contribution at the time of prediction
making. For this experiment, we are going to consider the following feature selection
methods :
1. Filter Methods: Rely on the features’ characteristics without using any machine
learning algorithm. Very well-suited for a quick “screen and removal” of irrelevant
features.
a. Spearman Correlation
b. Pearson Correlation
c. Kendall Correlation
d. Chi Squared Test
e. Information Gain

2. Wrapper methods: Consider the selection of a set of features as a search problem,

then uses a predictive machine learning algorithm to select the best feature subset.
In essence, these methods train a new model on each feature subset, which makes it
obviously very computationally expensive. However, they provide the best
performing feature subset for a given machine learning algorithm.
a. Forward Feature Selection
b. Backward Feature Elimination
c. Step Wise Feature Selection

Program :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, MinMaxScaler
from sklearn.feature_selection import mutual_info_classif, chi2, SelectKBest
from sklearn.feature_selection import f_classif
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score,
f1_score

file_path = r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab

Datasets\breast_cancer_wisconsin.csv' # Update this with your file path
data = pd.read_csv(file_path)

X = data.iloc[:, :-1]
y = data.iloc[:, -1]

if y.dtype == 'O':
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

min_max_scaler = MinMaxScaler()
X_normalized = min_max_scaler.fit_transform(X)

selected_features = {}

# Pearson Correlation
pearson_corr = X.corr(method='pearson').iloc[:, -1].abs()
selected_features['Pearson'] = np.where(pearson_corr > 0.4)[0]

# Spearman Correlation
spearman_corr = X.corr(method='spearman').iloc[:, -1].abs()
selected_features['Spearman'] = np.where(spearman_corr > 0.4)[0]

# Kendall Correlation
kendall_corr = X.corr(method='kendall').iloc[:, -1].abs()
selected_features['Kendall'] = np.where(kendall_corr > 0.4)[0]

# Mutual Information Gain

mutual_info = mutual_info_classif(X_scaled, y)
selected_features['Mutual_Info'] = np.where(mutual_info > 0.3)[0]

# Chi-Squared Test (use non-negative normalized features)

chi_scores, _ = chi2(X_normalized, y)
selected_features['Chi_Squared'] = np.where(chi_scores > 3)[0]

# Forward Feature Selection (using RFE)

logreg = LogisticRegression()
rfe_selector = RFE(logreg, n_features_to_select=5)

rfe_selector = rfe_selector.fit(X_scaled, y)
selected_features['Forward_Selection'] = np.where(rfe_selector.support_)[0]

# Backward Feature Elimination (using SelectKBest)

k_best_selector = SelectKBest(f_classif, k=5)

k_best_selector.fit(X_scaled, y)
selected_features['Backward_Elimination'] =
np.where(k_best_selector.get_support())[0]

# Stepwise Selection (combination of forward and backward)

stepwise_features =
list(set(selected_features['Forward_Selection']).union(selected_features['Bac
kward_Elimination']))
selected_features['Stepwise_Selection'] = np.array(stepwise_features)

# Compare Naive Bayes Performance

def evaluate_model(X_train, X_test, y_train, y_test):
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
return {
'accuracy': accuracy_score(y_test, y_pred),
'precision': precision_score(y_test, y_pred, average='weighted'),
'recall': recall_score(y_test, y_pred, average='weighted'),
'f1_score': f1_score(y_test, y_pred, average='weighted')
}

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,

test_size=0.3, random_state=42)

all_features_results = evaluate_model(X_train, X_test, y_train, y_test)

results = {'All_Features': all_features_results}

for method, indices in selected_features.items():
X_train_sel, X_test_sel = X_train[:, indices], X_test[:, indices]
results[method] = evaluate_model(X_train_sel, X_test_sel, y_train,
y_test)
results_df = pd.DataFrame(results).T
print(results_df)

accuracy_results = {method: result['accuracy'] for method, result in

results.items()}

accuracy_df = pd.DataFrame(list(accuracy_results.items()), columns=['Feature

Selection Method', 'Accuracy'])

plt.figure(figsize=(10, 6))
sns.barplot(x='Feature Selection Method', y='Accuracy', data=accuracy_df,
palette='viridis')
plt.title('Accuracy of Naive Bayes Classifier with Different Feature
Selection Methods')
plt.xlabel('Feature Selection Method')
plt.ylabel('Accuracy')
plt.xticks(rotation=45, ha='right')
plt.ylim(0, 1)
plt.tight_layout()
plt.show()
Output :
Selected Features via Pearson, Spearman and Kendall Correlation (Thresholds : 0.6, 0.6, 0.4):

Selected Features via Mutual Information Gain :

Selected Features via Chi Sqaured Test :

Selected Features via Forward Feature Selection, Backward Feature Elimination, Step Wise
Selection :
A comparision of different evaluation metric for Naïve Bayes Classifier :

Conclusion :
We can see that Forward Feature Selection performs the best out of the lot.
Experiment 2
Aim :
Write a program to implement linear regression without using python libraries.

Dataset :
Student Performance Dataset contains two columns: Hours and Scores.
• Hours: Represents the number of hours spent studying or practicing (independent
variable or feature).
• Scores: Represents the scores achieved (dependent variable or target).

Algorithm :
Linear regression is used for finding linear relationship between target and one or more
predictors. There are two types of linear regression :

• Simple (one input variable or predictor)

• Multiple (more than one input variables)
Here, we perform simple linear regression on the above dataset.
Simple linear regression is used to estimate the relationship between two quantitative
variables. You can use simple linear regression when you want to know:
1. How strong the relationship is between two variables (e.g. the relationship
between rainfall and soil erosion).
2. The value of the dependent variable at a certain value of the independent variable
(e.g. the amount of soil erosion at a certain level of rainfall).

Program :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

student_scores=pd.read_csv(r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab
Datasets\student_scores.csv')

student_scores.isnull().sum()
print(student_scores)

corr_matrix = student_scores.corr()
print(corr_matrix)

x=student_scores.Hours.values.reshape(-1,1)
y=student_scores.Scores.values.reshape(-1,1)

plt.scatter(x,y,color='blue')
plt.xlabel('Number of Hours studied')
plt.ylabel('Scores Obtained')
plt.title('Simple Linear graph')
plt.show()

hours = student_scores['Hours']
scores = student_scores['Scores']
hours = np.array(hours)
scores = np.array(scores)

mean_x = np.mean(hours)
mean_y = np.mean(scores)

numerator = np.sum((hours - mean_x) * (scores - mean_y))

denominator = np.sum((hours - mean_x) ** 2)
slope = numerator / denominator
intercept = mean_y - slope * mean_x

print(f"Slope (m): {slope}")

print(f"Intercept (b): {intercept}")

def predict(x):
return slope * x + intercept

predictions = predict(hours)

print("\nPredictions:")
for h, p in zip(hours, predictions):
print(f"Hours: {h}, Predicted Score: {p}")

ss_total = np.sum((scores - mean_y) ** 2)

ss_residual = np.sum((scores - predictions) ** 2)
r2 = 1 - (ss_residual / ss_total)
print(f"\nR^2 score: {r2}")

plt.figure(figsize=(10, 6))
plt.scatter(hours, scores, color='blue', label='Actual Scores')
plt.plot(hours, predictions, color='red', linewidth=2, label='Fitted Line')
plt.xlabel('Hours')
plt.ylabel('Scores')
plt.title('Actual vs Predicted Scores')
plt.legend()
plt.grid(True)
plt.show()
Output :

The dataset looks like this :

Upon performing linear regression, the best fit line looks like :

Where the predictions and the R^2 score are :

Conclusion :
The linear regression algorithm built manually gives us outputs similar to the actual scores
which results in an accuracy of 95.2%
Experiment 3
Aim :
Write a program to implement logistic regression without using python libraries.

Dataset :
Breast Cancer Wisconsin Dataset
The Breast Cancer Wisconsin dataset is a popular dataset used for binary classification tasks
in machine learning. It is commonly used for research and educational purposes, particularly
in the context of medical diagnostics, as it involves determining whether a breast cancer
tumor is benign or malignant based on a set of features computed from a digitized image of
a fine needle aspirate (FNA) of a breast mass.
Dataset Composition
• Number of Instances: 569
• Number of Attributes (Features): 30 numeric features (floating-point) and 1 target
attribute.
• Target Attribute:
o 0 for benign tumors.
o 1 for malignant tumors.
Here is a detailed list of the features:
11. Radius: Mean of distances from the center to points on the perimeter.
12. Texture: Standard deviation of gray-scale values.
13. Perimeter: The total distance around the boundary of the nucleus.
14. Area: The area within the boundary of the nucleus.
15. Smoothness: Local variation in radius lengths.
16. Compactness: Perimeter² / area - 1.0.
17. Concavity: Severity of concave portions of the contour.
18. Concave Points: Number of concave portions of the contour.
19. Symmetry: Symmetry of the nucleus.
20. Fractal Dimension: "Coastline approximation" - 1 (roughness).
Each of these 10 basic features is computed in three different ways (mean, standard error,
and "worst" or largest). Thus, there are 30 features in total.
Dataset Format
The dataset is usually presented in a tabular format with the following columns:
• ID Number: Unique identifier for each instance.
• Diagnosis: Target class (M = malignant, B = benign).
• 30 Feature Columns: As described above.

Algorithm :
The logistic function, also called the sigmoid function, is an S-shaped curve that can take any
real-valued number and map it into a value between 0 and 1, but never exactly at those
limits. [1 / (1 + e^-value)]. The key difference from linear regression is that the output value
being modeled is a binary values (0 or 1) rather than a numeric value. Logistic regression is
named for the function used at the core of the method, the logistic function.
Based on the number of categories, Logistic regression can be classified as:
1. binomial: target variable can have only 2 possible types: “0” or “1” which may
represent “win” vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc.
2. multinomial: target variable can have 3 or more possible types which are not
ordered(i.e. types have no quantitative significance) like “disease A” vs “disease B” vs
“disease C”.
3. ordinal: it deals with target variables with ordered categories. For example, a test
score can be categorized as:“very poor”, “poor”, “good”, “very good”. Here, each
category can be given a score like 0, 1, 2, 3.

Logistic regression uses an equation as the representation, very much like linear regression.
Input values (x) are combined linearly using weights or coefficient values (referred to as the
Greek capital letter Beta) to predict an output value (y). Below is an example logistic
regression equation: y = e^(b0 + b1*x) / (1 + e^(b0 + b1*x)) where y is the predicted output,
b0 is the bias or intercept term and b1 is the coefficient for the single input value (x).

Program :

import numpy as np
import pandas as pd

file_path = r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab

Datasets\breast_cancer_wisconsin.csv' # Update this with your file path
data = pd.read_csv(file_path)

data['diagnosis'] = data['diagnosis'].map({'M': 1, 'B': 0})

# Features and target

X = data.drop(['diagnosis', 'id'], axis=1) # Drop the target and any non-
feature columns
y = data['diagnosis']

# Normalize features
X = (X - X.mean()) / X.std()
# Add a column of ones to X for the intercept term
X = np.c_[np.ones(X.shape[0]), X] # Adding the intercept term

# Convert to numpy arrays

X = np.array(X)
y = np.array(y)

def sigmoid(z):
return 1 / (1 + np.exp(-z))

def compute_cost(X, y, theta):

m = len(y)
h = sigmoid(X.dot(theta))
cost = (1/m) * (-y.dot(np.log(h)) - (1 - y).dot(np.log(1 - h)))
return cost

def gradient_descent(X, y, theta, alpha, num_iters):

m = len(y)
cost_history = np.zeros(num_iters)

for i in range(num_iters):
h = sigmoid(X.dot(theta))
gradient = (1/m) * X.T.dot(h - y)
theta -= alpha * gradient
cost_history[i] = compute_cost(X, y, theta)

return theta, cost_history

# Initialize parameters
theta = np.zeros(X.shape[1])
alpha = 0.01 # Learning rate
num_iters = 1000 # Number of iterations

# Perform gradient descent

theta, cost_history = gradient_descent(X, y, theta, alpha, num_iters)

# Output results
print("Optimized theta:", theta)
print("Final cost:", cost_history[-1])

def predict(X, theta):

return sigmoid(X.dot(theta)) >= 0.5

# Predict on the training data

# Predict on the training data
predictions = predict(X, theta)

# Print predictions alongside actual values

for i in range(len(predictions)):
print(f"Prediction: {int(predictions[i])}, Actual: {int(y[i])}")

# Calculate accuracy
accuracy = np.mean(predictions == y)
print("Training accuracy:", accuracy)
Output :
The optimized cost or theta via gradient descent at every iteration is as follows :

The predictions made from our manual logistic regression model look something like :

Where 1 is for malignant and 0 is for benign.

The accuracy of our model during training is:

Conclusion :
Our manual logistic regression algorithm classifies the data into malignant or benign and
gives an accuracy of 98.2%
Experiment 4
Aim :
Write a program to implement the support Vector machine algorithm.

Dataset :
Breast Cancer Wisconsin Dataset
The Breast Cancer Wisconsin dataset is a popular dataset used for binary classification tasks
in machine learning. It is commonly used for research and educational purposes, particularly
in the context of medical diagnostics, as it involves determining whether a breast cancer
tumor is benign or malignant based on a set of features computed from a digitized image of
a fine needle aspirate (FNA) of a breast mass.
Dataset Composition
• Number of Instances: 569
• Number of Attributes (Features): 30 numeric features (floating-point) and 1 target
attribute.
• Target Attribute:
o 0 for benign tumors.
o 1 for malignant tumors.
Here is a detailed list of the features:
21. Radius: Mean of distances from the center to points on the perimeter.
22. Texture: Standard deviation of gray-scale values.
23. Perimeter: The total distance around the boundary of the nucleus.
24. Area: The area within the boundary of the nucleus.
25. Smoothness: Local variation in radius lengths.
26. Compactness: Perimeter² / area - 1.0.
27. Concavity: Severity of concave portions of the contour.
28. Concave Points: Number of concave portions of the contour.
29. Symmetry: Symmetry of the nucleus.
30. Fractal Dimension: "Coastline approximation" - 1 (roughness).
Each of these 10 basic features is computed in three different ways (mean, standard error,
and "worst" or largest). Thus, there are 30 features in total.
Dataset Format
The dataset is usually presented in a tabular format with the following columns:
• ID Number: Unique identifier for each instance.
• Diagnosis: Target class (M = malignant, B = benign).
• 30 Feature Columns: As described above.

Algorithm :
Support Vector Machine (SVM) is a supervised machine learning algorithm commonly used
for classification tasks. It works by finding an optimal hyperplane that best separates the
data into different classes. For this experiment, SVM will attempt to classify tumors as either
malignant (cancerous) or benign (non-cancerous). The objective of the SVM is to maximize
the margin between data points of different classes, thus creating a robust classifier.

Program :

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, MinMaxScaler
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
file_path = r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab
Datasets\breast_cancer_wisconsin.csv' # Update this with your file path
data = pd.read_csv(file_path)
data.head()
data.shape
data = data.dropna()

label_encoders = {}
for column in data.select_dtypes(include=['object']).columns:
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
label_encoders[column] = le
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

min_max_scaler = MinMaxScaler()
X_normalized = min_max_scaler.fit_transform(X)
X.corr()
plt.figure(figsize=(18, 12))
sns.heatmap(X.corr(), vmin=0.85, vmax=1, annot=True, cmap='YlGnBu',
linewidths=.5)
correlation_matrix = X.corr()
correlation_with_target = correlation_matrix.iloc[:-1, -1].abs() #
Correlation of features with target
selected_features = correlation_with_target[correlation_with_target >
0.5].index # Select features with correlation > 0.5
selected_features
features = ['radius_mean', 'perimeter_mean', 'area_mean', 'compactness_mean',
'concavity_mean', 'concave points_mean', 'radius_se',
'perimeter_se',
'area_se', 'radius_worst', 'perimeter_worst', 'area_worst',
'compactness_worst', 'concavity_worst', 'concave points_worst']
X = X[features]
print(X.columns)
correlation_matrix = X.corr()
high_corr_pairs = []
for i in range(len(correlation_matrix.columns)):
for j in range(i):
if correlation_matrix.iloc[i, j] > 0.9:
high_corr_pairs.append((correlation_matrix.columns[i],
correlation_matrix.columns[j]))

for high_corr_pair in high_corr_pairs:

print("Highly correlated pairs:", high_corr_pair)

# Drop one feature from each pair of highly correlated features

features_to_drop = set()
for pair in high_corr_pairs:
features_to_drop.add(pair[1]) # Arbitrarily drop the second feature in
each pair
X = X.drop(columns=features_to_drop)
X.columns
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize the SVM classifier

svm_model = SVC(kernel='linear') # Using linear kernel for binary
classification

# Train the model

svm_model.fit(X_train, y_train)

# Predict on the test data

y_pred = svm_model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("\nClassification Report:\n", classification_rep)
print("\nConfusion Matrix:\n", confusion_mat)
Output :
The correlation map of all the features looks like the one below. We took the features which
had a correlation of 0.5 or more with the target variable and for those selected features, we
dropped the ones which had higher correlation between each other (above 0.9).

The final output contains of the binary predictions made on the test data, accuracy, the
confusion matrix, precision, recall, f1 score and support :

Conclusion :
The SVM model on the Breast Cancer Wisconsin dataset performs with high accuracy (97%),
demonstrating strong capability in distinguishing between malignant and benign tumors.
The high precision and recall scores reflect the model's reliability in clinical contexts, where
accurate classification is essential.

Experiment 5
Aim :
Write a program to implement back propagation neural network for classification of sample data
without using Python libraries

Dataset :
The Iris dataset is a well-known dataset in machine learning and statistics, often used for
classification tasks. It consists of 150 samples of iris flowers from three different species:
Setosa, Versicolor, and Virginica. The dataset includes 4 features, which are the
measurements of the flowers' sepals and petals:
1. Sepal Length: The length of the sepal in centimeters.
2. Sepal Width: The width of the sepal in centimeters.
3. Petal Length: The length of the petal in centimeters.
4. Petal Width: The width of the petal in centimeters.
The dataset also includes the Species label, which identifies the species of each iris flower as
either Setosa, Versicolor, or Virginica. The dataset is balanced, with 50 samples from each
species.
• Number of instances: 150
• Number of attributes: 4 features (sepal length, sepal width, petal length, petal
width)
• Classes: 3 species (Setosa, Versicolor, Virginica)

Algorithm :
Backpropagation Network
• It is a multilayer feed forward network
• BPN has two phases:
• Forward pass phase: computes ‘functional signal’, feed forward propagation of input
pattern signals through network
• Backward pass phase: computes ‘error signal’, propagates the error backwards
through network starting at output units (where the error is the difference between
actual and desired output values)
In this implementation, the neural network consists of:
• Input layer: 4 neurons (corresponding to the 4 input features).
• Hidden layers: 2 hidden layers (with 10 and 8 neurons respectively).
• Output layer: 3 neurons (corresponding to the 3 possible species of the iris).
The network is trained for a specified number of epochs (3000 in the updated version),
during which the weights are updated using the backpropagation algorithm to reduce the
classification error.

Program :

import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

def load_data():

data = pd.read_csv(r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab

Datasets\Iris.csv')
print(data.columns)
data.drop('Id', axis = 1, inplace=True)
print(data.columns)

X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)
y = OneHotEncoder(sparse_output=False).fit_transform(y.reshape(-1, 1))

scaler = StandardScaler()
X = scaler.fit_transform(X)

return X, y

# Define the neural network class with two hidden layers

class NeuralNetwork:
def __init__(self, input_size, hidden_size1, hidden_size2, output_size,
lr=0.1):
# Initialize weights and biases
self.lr = lr
self.weights_input_hidden1 = np.random.rand(input_size, hidden_size1)
self.weights_hidden1_hidden2 = np.random.rand(hidden_size1,
hidden_size2)
self.weights_hidden2_output = np.random.rand(hidden_size2,
output_size)

self.bias_hidden1 = np.random.rand(hidden_size1)
self.bias_hidden2 = np.random.rand(hidden_size2)
self.bias_output = np.random.rand(output_size)

def sigmoid(self, x):

return 1 / (1 + np.exp(-x))

def sigmoid_derivative(self, x):

return x * (1 - x)

def forward(self, X):

# Forward pass with two hidden layers
self.hidden_input1 = np.dot(X, self.weights_input_hidden1) +
self.bias_hidden1
self.hidden_output1 = self.sigmoid(self.hidden_input1)

self.hidden_input2 = np.dot(self.hidden_output1,
self.weights_hidden1_hidden2) + self.bias_hidden2
self.hidden_output2 = self.sigmoid(self.hidden_input2)

self.final_input = np.dot(self.hidden_output2,
self.weights_hidden2_output) + self.bias_output
self.final_output = self.sigmoid(self.final_input)
return self.final_output

def backward(self, X, y, output):

# Calculate the error and its derivative
error = y - output
d_output = error * self.sigmoid_derivative(output)
# Backpropagation through the second hidden layer
hidden2_error = d_output.dot(self.weights_hidden2_output.T)
d_hidden2 = hidden2_error *
self.sigmoid_derivative(self.hidden_output2)

# Backpropagation through the first hidden layer

hidden1_error = d_hidden2.dot(self.weights_hidden1_hidden2.T)
d_hidden1 = hidden1_error *
self.sigmoid_derivative(self.hidden_output1)

# Update weights and biases

self.weights_hidden2_output += self.hidden_output2.T.dot(d_output) *
self.lr
self.weights_hidden1_hidden2 += self.hidden_output1.T.dot(d_hidden2)
* self.lr
self.weights_input_hidden1 += X.T.dot(d_hidden1) * self.lr

self.bias_output += np.sum(d_output, axis=0) * self.lr

self.bias_hidden2 += np.sum(d_hidden2, axis=0) * self.lr
self.bias_hidden1 += np.sum(d_hidden1, axis=0) * self.lr

def train(self, X, y, epochs=2000):

for epoch in range(epochs):
output = self.forward(X)
self.backward(X, y, output)

def predict(self, X):

output = self.forward(X)
return np.argmax(output, axis=1)

X, y = load_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

input_size = X_train.shape[1]
hidden_size1 = 10 # First hidden layer neurons
hidden_size2 = 8 # Second hidden layer neurons
output_size = y_train.shape[1]
learning_rate = 0.1
epochs = 3000

nn = NeuralNetwork(input_size, hidden_size1, hidden_size2, output_size,

lr=learning_rate)
nn.train(X_train, y_train, epochs=epochs)

y_pred = nn.predict(X_test)

print(y_pred)
y_test_labels = np.argmax(y_test, axis=1)
accuracy = accuracy_score(y_test_labels, y_pred)

print(f"Test Accuracy: {accuracy * 100:.2f}%")

Output :
After removing the ‘Id’ feature, the test predictions and the accuracy as predicted as output.
• Upon using just one hidden layer and the following attribute values for the neural
network, the accuracy came out to be 71% :
hidden_size = 5
learning_rate = 0.1
epochs = 1000

• When we increase the hidden layers to 2 and use the following attribute values for
the neural network, the accuracy came out to be 91% :
hidden_size1 = 10 # First hidden layer neurons
hidden_size2 = 8 # Second hidden layer neurons
epochs = 3000 # Increase epochs for better training
Conclusion :
In this implementation, we used a backpropagation neural network to classify iris flowers
from the Iris dataset. The neural network was able to achieve an accuracy of around 71%
initially, and after improving the model by:
• Adding more neurons in the hidden layers,
• Introducing a second hidden layer, and
• Scaling the features using StandardScaler,
the accuracy was improved to 91%. With the improved model (more neurons, more epochs,
and better feature scaling), we observed an increase in performance, which demonstrates
the importance of tuning the model architecture and training parameters.
Experiment 6
Aim :
Write a program to implement Principle Component Analysis technique of dimensionality
reduction and evaluate the performance with a classifier.

Dataset :
Breast Cancer Wisconsin Dataset
The Breast Cancer Wisconsin dataset is a popular dataset used for binary classification tasks
in machine learning. It is commonly used for research and educational purposes, particularly
in the context of medical diagnostics, as it involves determining whether a breast cancer
tumor is benign or malignant based on a set of features computed from a digitized image of
a fine needle aspirate (FNA) of a breast mass.
Dataset Composition
• Number of Instances: 569
• Number of Attributes (Features): 30 numeric features (floating-point) and 1 target
attribute.
• Target Attribute:
o 0 for benign tumors.
o 1 for malignant tumors.
Here is a detailed list of the features:
31. Radius: Mean of distances from the center to points on the perimeter.
32. Texture: Standard deviation of gray-scale values.
33. Perimeter: The total distance around the boundary of the nucleus.
34. Area: The area within the boundary of the nucleus.
35. Smoothness: Local variation in radius lengths.
36. Compactness: Perimeter² / area - 1.0.
37. Concavity: Severity of concave portions of the contour.
38. Concave Points: Number of concave portions of the contour.
39. Symmetry: Symmetry of the nucleus.
40. Fractal Dimension: "Coastline approximation" - 1 (roughness).
Each of these 10 basic features is computed in three different ways (mean, standard error,
and "worst" or largest). Thus, there are 30 features in total.
Dataset Format
The dataset is usually presented in a tabular format with the following columns:
• ID Number: Unique identifier for each instance.
• Diagnosis: Target class (M = malignant, B = benign).
• 30 Feature Columns: As described above.

Algorithm :
Principle Component Analysis :
• The main idea of principal component analysis (PCA) is to reduce the dimensionality
of a data set consisting of many variables correlated with each other, either heavily
or lightly, while retaining the variation present in the dataset, up to the maximum
extent.
• The same is done by transforming the variables to a new set of variables, which are
known as the principal components (or simply, the PCs) and are orthogonal, ordered
such that the retention of variation present in the original variables decreases as we
move down in the order.
• So, in this way, the 1st principal component retains maximum variation that was
present in the original components.
• The principal components are the eigenvectors of a covariance matrix, and hence
they are orthogonal.
• Principal component can be defined as a linear combination of optimally-weighted
observed variables.
• The output of PCA are these principal components, the number of which is equal to
the number of original variables.
• The PCs possess some useful properties which are listed below:
o The PCs are essentially the linear combinations of the original variables, the
weights vector in this combination is actually the eigenvector found which in
turn satisfies the principle of least squares.
o The PCs are orthogonal.
o The variation present in the PCs decrease as we move from the 1st PC to the
last one, hence the importance.

Program :

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import classification_report, accuracy_score
from sklearn.pipeline import make_pipeline
# Load the Breast Cancer Wisconsin (Diagnostic) dataset
file_path = r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab
Datasets\breast_cancer_wisconsin.csv' # Update this with your file path
data = pd.read_csv(file_path)

data = data.dropna()

label_encoders = {}
for column in data.select_dtypes(include=['object']).columns:
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
label_encoders[column] = le

X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Standardize the data before applying PCA

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 1. Train the classifier without PCA

clf_without_pca = make_pipeline(StandardScaler(), SVC())
clf_without_pca.fit(X_train, y_train)
y_pred_without_pca = clf_without_pca.predict(X_test)

# Evaluate performance without PCA

print("Performance without PCA:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_without_pca):.4f}")
print(classification_report(y_test, y_pred_without_pca))

# 2. Apply PCA for dimensionality reduction

pca = PCA(n_components=0.95) # Retain 95% of the variance
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

# Train the classifier with PCA

clf_with_pca = make_pipeline(StandardScaler(), SVC())
clf_with_pca.fit(X_train_pca, y_train)
y_pred_with_pca = clf_with_pca.predict(X_test_pca)

# Evaluate performance with PCA

print("\nPerformance with PCA:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_with_pca):.4f}")
print(classification_report(y_test, y_pred_with_pca))

# Optionally, print the number of components retained

print(f"\nNumber of components after PCA: {pca.n_components_}")
Output :
The accuracy with SVR (classifier) is 97.6% while the accuracy with 11 PCAs is 91.8%

Conclusion :
• PCA helps reduce the feature space, which can speed up the model training time
and help with overfitting.
• SVM works well with the dataset both with and without PCA. The performance
might remain similar, but with reduced features, the model might become more
generalizable, especially when dealing with noisy or redundant features.
Experiment 7
Aim :
Write a program to implement the ID3 algorithm.

Dataset :
Breast Cancer Wisconsin Dataset
The Breast Cancer Wisconsin dataset is a popular dataset used for binary classification tasks
in machine learning. It is commonly used for research and educational purposes, particularly
in the context of medical diagnostics, as it involves determining whether a breast cancer
tumor is benign or malignant based on a set of features computed from a digitized image of
a fine needle aspirate (FNA) of a breast mass.
Dataset Composition
• Number of Instances: 569
• Number of Attributes (Features): 30 numeric features (floating-point) and 1 target
attribute.
• Target Attribute:
o 0 for benign tumors.
o 1 for malignant tumors.
Here is a detailed list of the features:
41. Radius: Mean of distances from the center to points on the perimeter.
42. Texture: Standard deviation of gray-scale values.
43. Perimeter: The total distance around the boundary of the nucleus.
44. Area: The area within the boundary of the nucleus.
45. Smoothness: Local variation in radius lengths.
46. Compactness: Perimeter² / area - 1.0.
47. Concavity: Severity of concave portions of the contour.
48. Concave Points: Number of concave portions of the contour.
49. Symmetry: Symmetry of the nucleus.
50. Fractal Dimension: "Coastline approximation" - 1 (roughness).
Each of these 10 basic features is computed in three different ways (mean, standard error,
and "worst" or largest). Thus, there are 30 features in total.
Dataset Format
The dataset is usually presented in a tabular format with the following columns:
• ID Number: Unique identifier for each instance.
• Diagnosis: Target class (M = malignant, B = benign).
• 30 Feature Columns: As described above.

Algorithm :
ID3 Algorithm :
• ID3 stands for Iterative Dichotomiser 3.
• It uses the notion of information gain, which is defined in terms of entropy
• It is a classification algorithm that follows a greedy approach of building a decision
tree by selecting a best attribute that yields maximum Information Gain (IG)
• Attempts to create shortest decision tree
Input Data: The dataset consists of instances with attributes (features) and labels (class).
Steps:
• Calculate Entropy: Entropy is a measure of impurity in the dataset. The goal is to
reduce this impurity with each split.
• Information Gain: For each attribute, calculate the information gain by splitting the
data based on that attribute. The attribute with the highest information gain will be
chosen for the split.
• Build Tree Recursively: The ID3 algorithm recursively builds a decision tree by
selecting the best attribute at each level until all data is classified or a stopping
criterion is met (such as when all data is pure or no more attributes are left).
Stopping Criteria:
• All instances in the dataset belong to the same class.
• There are no remaining attributes to split on.
• All instances have the same attribute values.

Program :
import pandas as pd
import numpy as np
from collections import Counter

# Step 1: Load the Breast Cancer dataset

def load_data(file_path):
df = pd.read_csv(file_path)
# Convert categorical columns to numeric if necessary
df = df.apply(pd.to_numeric, errors='ignore')
return df
# Step 2: Calculate the entropy of a dataset
def entropy(data):
class_counts = Counter(data)
total = len(data)
ent = 0.0
for count in class_counts.values():
prob = count / total
ent -= prob * np.log2(prob)
return ent

# Step 3: Calculate the information gain for each feature

def information_gain(data, feature, target):
# Get the unique values of the feature
feature_values = data[feature].unique()
# Calculate the entropy of the entire dataset
total_entropy = entropy(data[target])
weighted_entropy = 0.0

for value in feature_values:

# Split the data based on feature value
subset = data[data[feature] == value]
weighted_entropy += (len(subset) / len(data)) *
entropy(subset[target])

# Information gain is the reduction in entropy

return total_entropy - weighted_entropy

# Step 4: Select the best feature to split on

def best_feature_to_split(data, features, target):
best_gain = -1
best_feature = None
for feature in features:
gain = information_gain(data, feature, target)
if gain > best_gain:
best_gain = gain
best_feature = feature
return best_feature

# Step 5: Build the decision tree recursively

def build_tree(data, features, target):
# Base Case 1: If all data points have the same target value, return a
leaf node
if len(data[target].unique()) == 1:
return data[target].iloc[0]

# Base Case 2: If there are no more features to split on, return the
majority class
if len(features) == 0:
return data[target].mode()[0]

# Get the best feature to split on

best_feature = best_feature_to_split(data, features, target)

# Create a new tree node

tree = {best_feature: {}}
# Split the data by the best feature
feature_values = data[best_feature].unique()
for value in feature_values:
# Subset the data for each feature value
subset = data[data[best_feature] == value]
# Recursively build the tree for this subset
subtree = build_tree(subset, [f for f in features if f !=
best_feature], target)
tree[best_feature][value] = subtree

return tree

def tree_depth(tree):
# If the tree is a leaf node (i.e., a class label), the depth is 0
if not isinstance(tree, dict):
return 0

# Recursively find the maximum depth of the tree

depths = []
for key, value in tree.items():
for subtree in value.values():
depths.append(tree_depth(subtree)) # Calculate the depth of each
subtree

return 1 + max(depths) # Add 1 for the current level

# Step 6: Classify a new instance using the decision tree

def classify(tree, instance):
if not isinstance(tree, dict):
return tree
feature = list(tree.keys())[0]
value = instance[feature]
subtree = tree[feature].get(value, None)
if subtree is None:
return None # Unknown value
return classify(subtree, instance)

# Example Usage:
# 1. Load data
file_path = r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab
Datasets\breast_cancer_wisconsin.csv'
data = load_data(file_path)

# 2. Define the target and feature columns

target = 'diagnosis' # Assuming 'diagnosis' is the class label
features = [col for col in data.columns if col != target]

# 3. Build the decision tree

tree = build_tree(data, features, target)
print("Decision Tree:", tree)
print(" ")
print(" ")
print(" ")
print(" ")
print(" ")
# 4. Make a prediction (Example instance)
for i in range(100):
example_instance = data.iloc[i] # Example instance for prediction
prediction = classify(tree, example_instance)

print(f"Prediction for {i}th sample:", prediction)

Output :

The ID3 classifier is successfully able to classify the samples into malignant (M) or bening(B)
cancer.

Conclusion :

• The ID3 algorithm is a straightforward approach to decision tree classification. It

works by choosing the feature that best separates the data at each step.
• In practice, ID3 can be used for small to medium-sized datasets with categorical
features.
Experiment 8
Aim :
Write a program to implement the Random forest algorithm.

Dataset :
Breast Cancer Wisconsin Dataset
The Breast Cancer Wisconsin dataset is a popular dataset used for binary classification tasks
in machine learning. It is commonly used for research and educational purposes, particularly
in the context of medical diagnostics, as it involves determining whether a breast cancer
tumor is benign or malignant based on a set of features computed from a digitized image of
a fine needle aspirate (FNA) of a breast mass.
Dataset Composition
• Number of Instances: 569
• Number of Attributes (Features): 30 numeric features (floating-point) and 1 target
attribute.
• Target Attribute:
o 0 for benign tumors.
o 1 for malignant tumors.
Here is a detailed list of the features:
51. Radius: Mean of distances from the center to points on the perimeter.
52. Texture: Standard deviation of gray-scale values.
53. Perimeter: The total distance around the boundary of the nucleus.
54. Area: The area within the boundary of the nucleus.
55. Smoothness: Local variation in radius lengths.
56. Compactness: Perimeter² / area - 1.0.
57. Concavity: Severity of concave portions of the contour.
58. Concave Points: Number of concave portions of the contour.
59. Symmetry: Symmetry of the nucleus.
60. Fractal Dimension: "Coastline approximation" - 1 (roughness).
Each of these 10 basic features is computed in three different ways (mean, standard error,
and "worst" or largest). Thus, there are 30 features in total.
Dataset Format
The dataset is usually presented in a tabular format with the following columns:
• ID Number: Unique identifier for each instance.
• Diagnosis: Target class (M = malignant, B = benign).
• 30 Feature Columns: As described above.

Algorithm :
Random Forest is an ensemble learning method that operates by building multiple decision
trees during training and outputting the mode of the classes (for classification) or mean
prediction (for regression) of the individual trees. This technique is used primarily for
classification and regression tasks.
Key steps in the Random Forest algorithm:
1. Data Bootstrapping: Randomly select subsets of the training data (with replacement)
for each tree.
2. Feature Randomization: At each node of a tree, a random subset of features is
considered for splitting, which helps reduce the correlation between trees and
increases the diversity of the model.
3. Training Trees: Build multiple decision trees using different subsets of the data and
features.
4. Voting/Averaging: After all trees are built, combine their predictions through voting
(for classification) or averaging (for regression) to obtain the final output.
The main advantages of Random Forest are its robustness to overfitting and its ability to
handle large datasets with higher dimensionality.

Program :
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler, LabelEncoder
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer

file_path = r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab

Datasets\breast_cancer_wisconsin.csv'
data = pd.read_csv(file_path)

data = data.dropna()

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Initialize the RandomForestClassifier

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model on the training data

rf_classifier.fit(X_train, y_train)

# Predict on the test data

y_pred = rf_classifier.predict(X_test)

# Evaluate the performance of the model

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("Classification Report:")
print(classification_report(y_test, y_pred))

# Feature importance plot

feature_importance = rf_classifier.feature_importances_
features = X.columns

# Sort feature importance

sorted_idx = feature_importance.argsort()
plt.barh(features[sorted_idx], feature_importance[sorted_idx])
plt.xlabel("Feature Importance")
plt.title("Feature Importance from Random Forest")
plt.show()

Output :
The accuracy comes out to be 96.4% via the random forest classifier. The feature importance
derived by this model is also shown below.
Conclusion :
Accuracy: The model provides a good accuracy on the test set. Random Forest generally
works well even with a relatively small number of trees and performs better than individual
decision trees due to its ensemble nature.
Feature Importance: By analyzing the feature importance, you can gain insights into which
features are most influential in the classification decision. This is helpful in identifying
important variables for further analysis or feature engineering.
Experiment 9
Aim :
Write a program to implement the K- nearest neighbor algorithm.

Dataset :
Breast Cancer Wisconsin Dataset
The Breast Cancer Wisconsin dataset is a popular dataset used for binary classification tasks
in machine learning. It is commonly used for research and educational purposes, particularly
in the context of medical diagnostics, as it involves determining whether a breast cancer
tumor is benign or malignant based on a set of features computed from a digitized image of
a fine needle aspirate (FNA) of a breast mass.
Dataset Composition
• Number of Instances: 569
• Number of Attributes (Features): 30 numeric features (floating-point) and 1 target
attribute.
• Target Attribute:
o 0 for benign tumors.
o 1 for malignant tumors.
Here is a detailed list of the features:
61. Radius: Mean of distances from the center to points on the perimeter.
62. Texture: Standard deviation of gray-scale values.
63. Perimeter: The total distance around the boundary of the nucleus.
64. Area: The area within the boundary of the nucleus.
65. Smoothness: Local variation in radius lengths.
66. Compactness: Perimeter² / area - 1.0.
67. Concavity: Severity of concave portions of the contour.
68. Concave Points: Number of concave portions of the contour.
69. Symmetry: Symmetry of the nucleus.
70. Fractal Dimension: "Coastline approximation" - 1 (roughness).
Each of these 10 basic features is computed in three different ways (mean, standard error,
and "worst" or largest). Thus, there are 30 features in total.
Dataset Format
The dataset is usually presented in a tabular format with the following columns:
• ID Number: Unique identifier for each instance.
• Diagnosis: Target class (M = malignant, B = benign).
• 30 Feature Columns: As described above.

Algorithm :
The K-Nearest Neighbors (K-NN) algorithm is a supervised learning technique that classifies
data based on the closest training examples in the feature space. The core idea behind K-NN
is simple: when a new data point is given, it is classified based on the majority class of its K
nearest neighbors from the training dataset. The distance between data points is usually
measured using metrics such as Euclidean distance.
Key Steps of K-NN Algorithm:
1. Choose the number K of neighbors. This is a hyperparameter that can be optimized
through cross-validation.
2. Compute the distance between the new data point and all points in the training
dataset.
3. Sort the distances in ascending order and select the K nearest points.
4. Determine the majority class from the K neighbors and assign this class to the new
data point.
5. Handle ties by using additional heuristics, like choosing the nearest class with the
fewest ties or random selection.

Program :
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the Breast Cancer Wisconsin dataset (CSV file)

file_path = r'C:\Users\tulik\Desktop\IGDTUW\ML\ML Lab\Lab
Datasets\breast_cancer_wisconsin.csv'
data = pd.read_csv(file_path)

data = data.dropna()

# Split dataset into training and testing sets (70% training, 30% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Standardize the feature data (important for K-NN)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize the K-NN classifier with K=5 (you can experiment with different
K values)
knn = KNeighborsClassifier(n_neighbors=5)

# Train the model on the training data

knn.fit(X_train, y_train)

# Predict on the test data

y_pred = knn.predict(X_test)
print('Prediction')
print(y_pred)
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy*100:.2f}%")

# Classification report for more details on precision, recall, and F1 score

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Output :
The accuracy upon using KNN as a classifier is 95.91% for neighbour = 5. We can see the
binary predictions and the confusion matrix :
If we increase the neighbours to 50 then the accuracy drops :

Conclusion :
K-NN: The K-NN algorithm works well for this task because it is intuitive and performs
reasonably well on this dataset. However, it can be sensitive to the value of K, so we tune
the value to K to be optimal (low in this case)

Georeferencers Finalproject wk4
100% (1)
Georeferencers Finalproject wk4
13 pages
Quick Track Positioner
100% (2)
Quick Track Positioner
138 pages
Implementing Information For Health: Even More Challenging Than Expected?
No ratings yet
Implementing Information For Health: Even More Challenging Than Expected?
69 pages
Module2.1 Feature Selection
No ratings yet
Module2.1 Feature Selection
38 pages
Module2.1 Feature Selection
No ratings yet
Module2.1 Feature Selection
46 pages
Warpper Method
No ratings yet
Warpper Method
8 pages
Decision Support
No ratings yet
Decision Support
21 pages
Machine
100% (1)
Machine
45 pages
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
No ratings yet
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
9 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
MDS372_LAB4_2448001
No ratings yet
MDS372_LAB4_2448001
17 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Lab 4_Feature Selection_Appendix
No ratings yet
Lab 4_Feature Selection_Appendix
3 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
No ratings yet
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
4 pages
Practical Aplication 2
No ratings yet
Practical Aplication 2
10 pages
Feature Select
No ratings yet
Feature Select
13 pages
CSL0777 L07fgfdg
No ratings yet
CSL0777 L07fgfdg
28 pages
Feature Selection 16891042299
No ratings yet
Feature Selection 16891042299
23 pages
Special Topic: Missing Values
No ratings yet
Special Topic: Missing Values
25 pages
Feature PDF
No ratings yet
Feature PDF
16 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
CQF June 2021 M4L4 Solutions
No ratings yet
CQF June 2021 M4L4 Solutions
14 pages
Feature Selection
No ratings yet
Feature Selection
8 pages
python_final_project_group_03
No ratings yet
python_final_project_group_03
18 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
CDT B1 Lab06 MondayWeek2
No ratings yet
CDT B1 Lab06 MondayWeek2
6 pages
eTasci
No ratings yet
eTasci
26 pages
A Computational Study On Classification of Malignant
No ratings yet
A Computational Study On Classification of Malignant
63 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
The 5 Feature Selection Algorithms Every Data Scientist Should Know
No ratings yet
The 5 Feature Selection Algorithms Every Data Scientist Should Know
29 pages
ML Lab - Sukanya Raja
No ratings yet
ML Lab - Sukanya Raja
23 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
DOC-20241211-WA0028.
No ratings yet
DOC-20241211-WA0028.
10 pages
Anembeddedfeatureselectionmethodbasedongeneralizedclassifierneural Network For Cancer Classification
No ratings yet
Anembeddedfeatureselectionmethodbasedongeneralizedclassifierneural Network For Cancer Classification
11 pages
1) SVM-RFE: This Is A Popular Method For Feature Selection Where Ranking Is Done Based On
No ratings yet
1) SVM-RFE: This Is A Popular Method For Feature Selection Where Ranking Is Done Based On
6 pages
Local-Learning-Based Feature Selection For High-Dimensional Data Analysis
No ratings yet
Local-Learning-Based Feature Selection For High-Dimensional Data Analysis
18 pages
Mbedded Methods For Feature Selection in Neural Networks: Reprint
No ratings yet
Mbedded Methods For Feature Selection in Neural Networks: Reprint
7 pages
10_chapter 3
No ratings yet
10_chapter 3
15 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
u1 p2 2
No ratings yet
u1 p2 2
66 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
Team No-7
No ratings yet
Team No-7
12 pages
Exp 3 Bi 30
No ratings yet
Exp 3 Bi 30
7 pages
minor project
No ratings yet
minor project
21 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
ML_Lab_01999676272
No ratings yet
ML_Lab_01999676272
12 pages
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
No ratings yet
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
14 pages
fbinf-02-927312
No ratings yet
fbinf-02-927312
17 pages
UNITIV.BtechIot
No ratings yet
UNITIV.BtechIot
43 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
ML Report2
No ratings yet
ML Report2
21 pages
Selecting Critical Features For Data Classification Based On Machine Learning Methods
No ratings yet
Selecting Critical Features For Data Classification Based On Machine Learning Methods
26 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
ML pr5
No ratings yet
ML pr5
3 pages
Features Election
No ratings yet
Features Election
62 pages
20BCP021 Assignment 6
No ratings yet
20BCP021 Assignment 6
15 pages
1 (1)
No ratings yet
1 (1)
14 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Basic Bot Guide Rev Part2
No ratings yet
Basic Bot Guide Rev Part2
41 pages
Using Bellville Springs To Maintain Bolt Preload
No ratings yet
Using Bellville Springs To Maintain Bolt Preload
30 pages
The Ultimate Guide To: Windows Server 2019
No ratings yet
The Ultimate Guide To: Windows Server 2019
20 pages
UPC3504
No ratings yet
UPC3504
10 pages
Project Charter Sample
No ratings yet
Project Charter Sample
18 pages
1306 SCADA System Commissioning Procedure 15.07.16
No ratings yet
1306 SCADA System Commissioning Procedure 15.07.16
9 pages
CCE 411: Digital Communication Systems: American University of Science & Technology
No ratings yet
CCE 411: Digital Communication Systems: American University of Science & Technology
4 pages
Samurai Shodown4 (Codes) PDF
No ratings yet
Samurai Shodown4 (Codes) PDF
2 pages
Instant Access to Six Sigma Software Development Second Edition Christine B. Tayntor ebook Full Chapters
100% (3)
Instant Access to Six Sigma Software Development Second Edition Christine B. Tayntor ebook Full Chapters
61 pages
Lab Course File EC 601 DSP
No ratings yet
Lab Course File EC 601 DSP
17 pages
Facebook Pages (Parcours Client) December 1, 2022 - January 31, 2023
No ratings yet
Facebook Pages (Parcours Client) December 1, 2022 - January 31, 2023
13 pages
كتاب تعليم الشطرنج PDF
No ratings yet
كتاب تعليم الشطرنج PDF
34 pages
Brian Daniels Phddoc R PDF
No ratings yet
Brian Daniels Phddoc R PDF
212 pages
TTC Policy 2023
No ratings yet
TTC Policy 2023
12 pages
Logeligaliwibuwetuzupa
No ratings yet
Logeligaliwibuwetuzupa
3 pages
87-83196-1 REV D Snapdragon 8 Elite Mobile Platform Product Brief
No ratings yet
87-83196-1 REV D Snapdragon 8 Elite Mobile Platform Product Brief
3 pages
Service Manual: 8M29A/E/H Chassis
No ratings yet
Service Manual: 8M29A/E/H Chassis
41 pages
Jmeter Pass Command Line Properties
No ratings yet
Jmeter Pass Command Line Properties
3 pages
Gracedazaresumebluegray Tacloban
No ratings yet
Gracedazaresumebluegray Tacloban
1 page
Mobile Phone Charger References
No ratings yet
Mobile Phone Charger References
13 pages
Wms Brochure PDF
No ratings yet
Wms Brochure PDF
28 pages
A FPGA Implementation of Model Predictive Control : K.V. Ling, S.P. Yue and J.M. Maciejowski
No ratings yet
A FPGA Implementation of Model Predictive Control : K.V. Ling, S.P. Yue and J.M. Maciejowski
6 pages
History of The Synthetiser
No ratings yet
History of The Synthetiser
32 pages
First Page Ram
100% (1)
First Page Ram
6 pages
Manual Lp3
No ratings yet
Manual Lp3
44 pages
Metex M3600 B Series Operating Manual
No ratings yet
Metex M3600 B Series Operating Manual
13 pages
F-050 Impressed Current System
100% (1)
F-050 Impressed Current System
80 pages