0% found this document useful (0 votes)
8 views

skill

Ugygvy t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

skill

Ugygvy t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

MACHINE LEARNING

SKILL WORKBOOK
22AIP3101R

STUDENT ID: ACADEMIC YEAR: 2024-25


STUDENT
NAME:
SEM/YEAR:
DEPARTMENT:
Table of Contents
S. No. List of Experiments Session Page No.

Analyse a given dataset by applying various data preprocessing and


1 1
data exploration techniques.
Build a machine learning model to forecast the solar plant output to
2 5
the extent possible which can be used for better Grid Management.
Build a machine learning model to predict whether a person has heart
3 8
disease or not of the person.
Based on census data, build a machine learning model to classify
4 11
whether income exceeds $50K/yr.
Build a machine learning model to predict new medicines with
5 14
BELKA.
Build a machine learning model to predict heart disease from
6 17
anonymized data.
Build a machine learning model that automatically categorizes and
7 labels new products added to the store, ensuring consistent and 19
accurate product classification.
Build a machine learning pricing model and compete against other
8 21
players for profit.
Build a machine learning model for the insurance company that has
9 decided to implement an anomaly detection system that can 23
automatically flag suspicious claims for further investigation.
Build a machine learning model to predict the weather to improve
10 their decision-making on typical farming activities such as planting 25
and irrigating.
S. No Date Experiment Name Pre- In-Lab (25M) Post- Viva Total Faculty
Lab Program/ Data and Analysis & Lab Voce (50M) Signature
(10M) Procedure Results Inference (10M) (5M)
(5M) (10M) (10M)

1. Analyse a given dataset by applying


various data preprocessing and data
exploration techniques.

Build a machine learning model to forecast


2. the solar plant output to the extent possible
which can be used for better Grid
Management.

Build a machine learning model to predict


3. whether a person has heart disease or not
of the person.

Based on census data, build a machine


4. learning model to classify whether income
exceeds $50K/yr..

Build a machine learning model to


5.
predict new medicines with BELKA.

Build a machine learning model to


6. predict heart disease from anonymized
data.
S. No Date Experiment Name Pre- In-Lab (25M) Post- Viva Total Faculty
Lab Program/ Data and Analysis & Lab Voce (50M) Signature
(10M) Procedure Results Inference (10M) (5M)
(5M) (10M) (10M)
Build a machine learning model that
7. automatically categorizes and labels new
products added to the store, ensuring
consistent and accurate product
classification.
Build a machine learning pricing model
8. and compete against other players for
profit.

9. Build a machine learning model for the


insurance company that has decided to
implement an anomaly detection system
that can automatically flag suspicious
claims for further investigation.
Build a machine learning model to
predict the weather to improve their
10. decision-making on typical farming
activities such as planting and
irrigating.
A.Y. 2024-25 SKILL CONTINUOUS EVALUATION
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Analyse a given dataset by applying various data preprocessing and data
exploration techniques.

Aim:

To analyse a given dataset by applying data preprocessing and exploration techniques using
Python, ensuring data quality and gaining initial insights for further analysis.

Objective:
1. Perform data cleaning, including handling missing values and outliers.
2. Apply data transformation techniques such as scaling or encoding.
3. Conduct exploratory data analysis (EDA) using visualization and descriptive statistics.
4. Derive key insights and summarize dataset characteristics.

Python Code:
import pandas as
pd import numpy as
np
import matplotlib.pyplot as
plt import seaborn as sns
from sklearn.preprocessing import StandardScaler,

LabelEncoder df=pd.read_csv("your_dataset.csv")

df.fillna(df.mean(),inplace=True)
df.fillna("Unknown",inplace=True
)

Q1=df.quantile(0.25
)
Q3=df.quantile(0.75
)
IQR=Q3-Q1
df=df[~((df<(Q1-1.5*IQR))|(df>(Q3+1.5*IQR))).any(axis=1)]

scaler=StandardScaler()
numeric_features=df.select_dtypes(include=[np.number])
df[numeric_features.columns]=scaler.fit_transform(numeric_features
)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

encoder=LabelEncoder()
categorical_features=df.select_dtypes(include=[object])
for col in categorical_features.columns:
df[col]=encoder.fit_transform(df[col])

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

print(df.describe())

plt.figure(figsize=(10,8))
sns.heatmap(df.corr(),annot=True,cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()

sns.pairplot(df
) plt.show()

df['specific_feature'].hist(bins=20)
plt.title("Distribution of Specific
Feature") plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

print("Dataset shape:",df.shape)
print("Dataset info:")
print(df.info())

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Result:

Observation:
1. Data Cleaning:
 Presence of missing values in specific columns and how they were handled.
 Detection and treatment of outliers using the IQR method.
2. Data Transformation:
 Numeric features scaled successfully, ensuring uniform distribution and range.
 Categorical variables encoded into numerical representations.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Build a machine learning model to forecast the solar plant output to the
extent possible which can be used for better Grid Management.
Aim:
To build a machine learning model that forecasts the solar plant output, aiding in better grid
management by predicting solar energy generation based on historical data.
Objective:
1. Collect and preprocess historical solar energy generation data.
2. Engineer relevant features such as weather conditions, time of day, and location.
3. Train a machine learning model (e.g., linear regression, decision trees, or deep learning)
to predict solar output.
4. Evaluate the model using appropriate metrics (e.g., Mean Squared Error, R-squared).
5. Visualize the predicted output vs. actual values to assess the model's accuracy.
Python Code:
import pandas as
pd import numpy
as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import
RandomForestRegressor
from sklearn.metrics import mean_squared_error,
r2_score from sklearn.preprocessing import
StandardScaler

df =
pd.read_csv("/content/solar_data.csv")
df.fillna(df.mean(), inplace=True)
df['hour'] =
pd.to_datetime(df['date']).dt.hour features =
['temperature', 'humidity', 'hour'] X =
df[features]
y = df['solar_output']

scaler = StandardScaler()
X_scaled =
scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2,

random_state=42) model = RandomForestRegressor(n_estimators=100, random_state=42)


model.fit(X_train, y_train)
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
y_pred =

model.predict(X_test)

mse = mean_squared_error(y_test,
y_pred) r2 = r2_score(y_test, y_pred)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

print(f"Mean Squared Error:


{mse}") print(f"R-squared: {r2}")

plt.figure(figsize=(10,6))
plt.plot(y_test.values, label='Actual Output')
plt.plot(y_pred, label='Predicted Output',
linestyle='--') plt.legend()
plt.title('Solar Output: Actual vs
Predicted') plt.xlabel('Samples')
plt.ylabel('Solar
Output') plt.show()

Result:

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Observation:

1. The Mean Squared Error (MSE) provides an indication of how well the model's
predictions match the actual data. A lower MSE value indicates better model
performance, with a smaller difference between predicted and actual solar
generation values.
2. The R-squared (R²) value measures the proportion of variance in the target variable
(solar generation) that is explained by the model. An R² close to 1 indicates that the
model is able to explain most of the variability in the target variable, suggesting a
good fit. An R² value near 0 implies that the model is not capturing much of the data's
variability.

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Build a machine learning model to predict whether a person has heart
disease or not of the person.
Aim:
To build a machine learning model that predicts the solar power generation (in GWh) based on
factors such as the number of solar plants, installed capacity, and average MW per plant.
Objective:
1. Preprocess the dataset by handling missing values and scaling the features.
2. Select relevant features (Number of Solar Plants, Installed Capacity (MW), Average
MW Per Plant) and target variable (Generation (GWh)).
3. Train a regression model (Random Forest Regressor) using the training data.
4. Evaluate the model's performance using Mean Squared Error (MSE) and R-squared.
5. Analyze the feature importance to understand which factors contribute most to
the prediction.
Python Code:
import pandas as
pd import numpy
as np

# Load the dataset (replace with your file


path) df = pd.read_csv("/content/solar.csv")

# Display columns and the first few rows to understand the


dataset print(df.columns)
print(df.head())

# Handle missing values: Only apply fillna() to numeric columns


numeric_columns = df.select_dtypes(include=[np.number]).columns
df[numeric_columns] =
df[numeric_columns].fillna(df[numeric_columns].mean())

# Feature selection: Define the columns that will be used for prediction
features = ['Number of Solar Plants', 'Installed Capacity (MW)', 'Average MW Per
Plant'] X = df[features]
y = df['Generation (GWh)'] # Target variable: Generation (GWh)

# Feature scaling
from sklearn.preprocessing import
StandardScaler scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

# Model building: Using Random Forest Regressor


from sklearn.ensemble import
RandomForestRegressor
model = RandomForestRegressor(n_estimators=100,
random_state=42) model.fit(X_train, y_train)

# Predictions and evaluation


y_pred =
model.predict(X_test)

from sklearn.metrics import mean_squared_error,


r2_score mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error:


{mse}") print(f"R-squared: {r2}")

# Feature importance
importances =
model.feature_importances_ print("Feature
Importance:", importances)

Result:

Observation:
1. Mean Squared Error (MSE): A lower MSE indicates better model performance, as it
signifies smaller deviations between predicted and actual values.
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
2. R-squared (R²): An R² value close to 1 indicates that the model explains most of the
variance in the target variable (generation).

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AIP3101R Page |1
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

3. Feature Importance: Identifying which features (e.g., Number of Solar Plants, Installed
Capacity (MW)) are most important in predicting the solar generation output.

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Based on census data, build a machine learning model to classify
whether income exceeds $50K/yr.
Aim:
To build a machine learning model that classifies individuals based on census data, predicting
whether their income exceeds $50K per year or not.
Objective:
1. Preprocess the census dataset by handling missing values, encoding categorical
variables, and scaling numerical features.
2. Select relevant features and target variable for classification (e.g., age, education,
occupation, etc.).
3. Split the data into training and testing sets.
4. Train a classification model (e.g., Logistic Regression, Decision Tree, or Random
Forest) using the training data.
5. Evaluate the model's performance using accuracy, precision, recall, and F1-score.
6. Analyze the importance of features in predicting whether the income exceeds $50K
per year.
Python Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import
RandomForestClassifier
from sklearn.metrics import accuracy_score,
classification_report from sklearn.preprocessing import
LabelEncoder

df =

pd.read_csv('/path/to/census_data.csv') df

= df.dropna()

label_encoder = LabelEncoder()
categorical_columns =
df.select_dtypes(include=['object']).columns for col in
categorical_columns:
df[col] = label_encoder.fit_transform(df[col])

X = df.drop('income',
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
axis=1) y = df['income']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

scaler = StandardScaler()
X_train =
scaler.fit_transform(X_train) X_test =
scaler.transform(X_test)

model = RandomForestClassifier(n_estimators=100,
random_state=42) model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


print(f"Accuracy: {accuracy * 100:.2f}%")

print(classification_report(y_test, y_pred))

importances =
model.feature_importances_
feature_names = X.columns
print("Feature Importance:")
for name, importance in zip(feature_names,
importances): print(f"{name}: {importance}")

Result:

Observation:
1. Accuracy: The percentage of correct predictions made by the model.
2. Precision: The proportion of true positives out of all predicted positives (useful for
minimizing false positives).
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
3. Recall: The proportion of true positives out of all actual positives (useful for minimizing
false negatives).

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

4. F1-Score: The harmonic mean of precision and recall, offering a balance between the
two metrics.
5. Feature Importance: Understanding which features (e.g., education level, occupation,
marital status) have the most significant impact on the classification outcome.

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Build a machine learning model to predict new medicines with BELKA.
Aim:
To develop and train a machine learning model using the BELKA dataset to predict new
medicines, identifying potential candidates for drug development based on various
chemical and biological features.

Objectives:
1. Import and preprocess the BELKA dataset, handling missing values and
encoding categorical variables.
2. Perform feature selection and scaling for better model performance.
3. Train a machine learning model (e.g., Random Forest or Support Vector Machine)
using the preprocessed dataset.
4. Evaluate the model's performance using appropriate metrics such as accuracy,
precision, recall, and F1-score.
5. Analyze the feature importance to understand which factors contribute most
to predicting new medicines.

Python Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import
RandomForestClassifier
from sklearn.metrics import accuracy_score,
classification_report from sklearn.utils import resample

# Sample data with more


rows data = {
'chemical_composition': [0.12, 0.15, 0.10, 0.20, 0.18, 0.17, 0.22, 0.13, 0.14, 0.19],
'toxicity': [0.8, 0.7, 0.6, 0.9, 0.85, 0.75, 0.65, 0.77, 0.72, 0.78],
'efficacy': [0.85, 0.75, 0.80, 0.70, 0.90, 0.85, 0.88, 0.80, 0.75, 0.79],
'side_effects': [0.1, 0.2, 0.15, 0.3, 0.05, 0.12, 0.17, 0.14, 0.22, 0.16],
'drug_class': [1, 0, 1, 0, 1, 1, 0, 1, 0, 1]
}

df = pd.DataFrame(data)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

X = df.drop('drug_class',
axis=1) y = df['drug_class']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train =
scaler.fit_transform(X_train) X_test =
scaler.transform(X_test)

model = RandomForestClassifier(n_estimators=100,
random_state=42) model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


print(f"Accuracy: {accuracy * 100:.2f}%")

print(classification_report(y_test, y_pred))

importances =
model.feature_importances_
feature_names = X.columns
print("Feature Importance:")
for name, importance in zip(feature_names,
importances): print(f"{name}: {importance}")
Result:

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Observations:
1. Accuracy: The model achieved a certain accuracy (e.g., 70-90%), which indicates its
ability to predict whether a drug is beneficial based on the features provided (e.g.,
chemical composition, toxicity, efficacy, side effects). A higher accuracy reflects
good model performance.
2. Precision, Recall, F1-Score: These metrics offer more insight into the
model's performance on the minority class:

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Build a machine learning model to predict heart disease


from anonymized data.
Aim:
To build a machine learning model to predict the presence of heart disease based
on anonymized data.
Objectives:
1. Load and preprocess the dataset.
2. Handle missing values and standardize the features.
3. Split the dataset into training and testing sets.
4. Train a logistic regression model to classify heart disease.
5. Evaluate the model using accuracy, precision, recall, and F1-score.
Python Code:
import pandas as
pd import numpy
as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import
LogisticRegression
from sklearn.metrics import accuracy_score,

classification_report df =

pd.read_csv("/content/heart_disease.csv") df.fillna(df.mean(),

inplace=True)

X = df.drop('target',
axis=1) y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42) scaler = StandardScaler()


X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model =
LogisticRegression()
model.fit(X_train, y_train)
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


classification_rep = classification_report(y_test, y_pred)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

print(f"Accuracy: {accuracy * 100:.2f}%")


print("Classification Report:\n", classification_rep)

Result:

Observation:
1. The model's accuracy may vary depending on the quality and size of the dataset. A
higher accuracy indicates a good model fit, while a lower accuracy suggests the need
for further tuning or improved data processing.
2. The precision, recall, and F1-score are useful to evaluate the model's
performance, especially when dealing with imbalanced datasets.

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Build a machine learning model that automatically categorizes and labels
new products added to the store, ensuring consistent and accurate product classification.

Aim:
To build a machine learning model that automatically categorizes and labels new products
added to a store, ensuring consistent and accurate product classification.

Objectives:
1. Preprocess the product data for feature extraction and transformation.
2. Train a machine learning classification model to categorize products into
predefined classes.
3. Evaluate the model's performance using appropriate metrics.
4. Predict the category of new products using the trained model.

Python Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import
TfidfVectorizer from sklearn.ensemble import
RandomForestClassifier from sklearn.metrics import
classification_report
data = {'Product': ['Wireless Mouse', 'Gaming Laptop', 'Bluetooth Speaker', 'Smartphone', 'LED
Monitor'],
'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'],
'Description': ['Wireless mouse with ergonomic design', 'High-performance laptop for
gaming', 'Portable Bluetooth speaker with great sound', 'Latest smartphone with 5G', 'Full
HD LED monitor']}
df =
pd.DataFrame(data) X =
df['Description']
y = df['Category']
vectorizer = TfidfVectorizer()
X_transformed =
vectorizer.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_transformed, y,
test_size=0.3, random_state=42)
model =
RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
new_products = ['Noise Cancelling Headphones', '4K Smart TV', 'Gaming
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
Keyboard'] new_products_transformed = vectorizer.transform(new_products)
predictions =
model.predict(new_products_transformed)
print(predictions)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Result:

Observations:
1. The model correctly categorizes products based on their descriptions.
2. The evaluation metrics (e.g., accuracy, precision, recall) show how well the
model performs on the test set.
3. Feature extraction using TF-IDF is effective for handling text data.
4. The model can be extended to handle more complex datasets by adding more
categories and enhancing preprocessing techniques.
5. The model can be used to classify new products automatically, ensuring consistent
and accurate labeling.

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Build a machine learning pricing model and compete against other
players for profit.

Aim:
To build a machine learning pricing model to predict the optimal price of a product and
maximize profit by competing against other players in a market.

Objectives:
1. To build a machine learning model that predicts optimal pricing strategies based
on historical data, market trends, and competitor pricing.
2. To optimize pricing decisions in a competitive environment by using the model
to forecast price changes, demand elasticity, and maximize profits.
3. To evaluate the model's performance against competitors by simulating
market dynamics and adjusting pricing based on real-time market conditions.

Python Code:
import pandas as pd
from sklearn.model_selection import
train_test_split from sklearn.linear_model import
LinearRegression
from sklearn.metrics import mean_squared_error,
r2_score data = {'Product': ['A', 'B', 'C', 'D', 'E'],
'Cost': [10, 20, 30, 40, 50],
'Demand': [1000, 800, 600, 400, 200],
'Price': [25, 40, 45, 60, 70],
'Competitor_Price': [28, 42, 44, 65,
68]} df = pd.DataFrame(data)
X = df[['Cost', 'Demand',
'Competitor_Price']] y = df['Price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred =
model.predict(X_test)
print(f"R-squared: {r2_score(y_test, y_pred)}")
print(f"Mean Squared Error: {mean_squared_error(y_test,
y_pred)}") new_products = [[15, 700, 40], [25, 300, 60]]
predicted_prices =
model.predict(new_products)
print(predicted_prices)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Result:

Observations:
1. The model predicts the optimal prices based on cost, demand, and competitor prices.
2. The R-squared value indicates the fit of the model to the data.
3. The Mean Squared Error quantifies the prediction error.
4. The model can be expanded with more features to improve pricing decisions.
5. The pricing strategy can be adjusted for maximum profit by considering
market conditions and competitor actions.

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Build a machine learning model for the insurance company that
has decided to implement an anomaly detection system that can automatically flag
suspicious claims for further investigation.
Aim:
To build a machine learning model for an insurance company that can automatically
detect anomalous or suspicious claims for further investigation.
Objectives:
1. Preprocess the historical claims data.
2. Apply anomaly detection techniques to identify suspicious claims.
3. Evaluate the model's performance using appropriate metrics.
4. Flag claims that exhibit anomalous behaviour, indicating potential fraud or errors.
Python Code:
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,
confusion_matrix data = {'Claim_ID': [1, 2, 3, 4, 5],
'Claim_Amount': [5000, 2000, 15000, 10000, 30000],
'Age': [25, 30, 40, 35, 50],
'Vehicle_Age': [5, 3, 8, 6, 10],
'Accident_Type': [0, 1, 1, 0, 0],
'Claim_History': [0, 1, 1, 0, 1],
'Claim_Status': [1, 0, 1, 0,
1]} df = pd.DataFrame(data)
X = df[['Claim_Amount', 'Age', 'Vehicle_Age', 'Accident_Type',
'Claim_History']] y = df['Claim_Status']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42) model = IsolationForest(contamination=0.2, random_state=42)
model.fit(X_train)
y_pred = model.predict(X_test)
y_pred = [1 if pred == 1 else 0 for pred in
y_pred] print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred)) print("\
nClassification Report:")
print(classification_report(y_test, y_pred))
# Fixing the boolean
indexing df_test =
X_test.copy()
df_test['y_pred'] = y_pred
suspicious_claims = df_test[df_test['y_pred'] ==
0] print("\nSuspicious Claims Detected:")
Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25
Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
print(suspicious_claims)

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Result:

Observations:
1. The model successfully detected anomalous claims using the Isolation Forest algorithm.
2. The confusion matrix and classification report indicate that the model has some
false positives and false negatives.
3. The identified suspicious claims include Claim_ID 2 and Claim_ID 4, which
were flagged for further investigation.
4. The model performed reasonably well with the given data, though further fine-
tuning and more data are necessary for better performance in real-world
applications.

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Experiment Title: Build a machine learning model to predict the weather to improve
their decision-making on typical farming activities such as planting and irrigating.

Aim:
To build a machine learning model that predicts weather conditions to help in
improving decision-making for typical farming activities such as planting and
irrigating.

Objective:
1. Collect and preprocess weather-related data.
2. Train a model to predict weather conditions (e.g., temperature, rainfall, humidity).
3. Use the model to suggest optimal farming actions based on predicted
weather conditions.

Python Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

data = {'Temperature': [25, 28, 23, 30, 22, 26, 27, 24, 29,
21],
'Humidity': [60, 55, 65, 50, 70, 60, 58, 67, 52, 71],
'Rainfall': [0.2, 0.0, 0.4, 0.0, 0.3, 0.1, 0.0, 0.5, 0.0, 0.2],
'Wind_Speed': [10, 12, 8, 5, 15, 10, 12, 8, 6, 13],
'Farming_Action': ['Irrigate', 'Plant', 'Irrigate', 'Plant', 'Irrigate', 'Irrigate', 'Irrigate',
'Plant', 'Irrigate', 'Irrigate']}

df = pd.DataFrame(data)

X = df[['Temperature', 'Humidity', 'Rainfall', 'Wind_Speed']]


y = df['Farming_Action']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
print("\nClassification Report:")

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

print(classification_report(y_test,
y_pred)) df_test = X_test.copy()
df_test['Predicted_Action'] = y_pred
print("\nPredicted Farming Actions:")
print(df_test)

Result:

Observations:
1. The model performs perfectly on the test data, with an accuracy of 100%, indicating
that it is able to predict the farming actions accurately based on weather conditions.
2. The confusion matrix shows no misclassifications, which means that the model is
highly reliable in determining whether to irrigate or plant.
3. The classification report further confirms the perfect performance, with precision,
recall, and F1-score of 1.00 for both classes.
4. The predicted farming actions match expected results based on the weather
conditions, providing valuable insights for farming decisions.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>

Evaluator Remark (if Any):


Marks Secured: _out of 50

Signature of the Evaluator with Date

Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

Course Title MACHINE LEARNING ACADEMIC YEAR: 2024-25


Course Code(s) 22AD2203R P a g e | 10

You might also like