0% found this document useful (0 votes)

6 views26 pages

manual(2023-CS-156).docx

The AI Lab Manual outlines tasks for the 2023-2027 session, specifically focusing on multiple linear regression and logistic regression techniques. It includes detailed steps for data preprocessing, model training, prediction, and evaluation using various metrics. The document serves as a guide for students in the CSC-371 Artificial Intelligence course at the University of Engineering and Technology, Lahore, Pakistan.

Uploaded by

ayesha batool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views26 pages

manual(2023-CS-156).docx

Uploaded by

ayesha batool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

AI Lab Manual

Session 2023-2027

Submitted By:
2023-CS-156 Khola Raouf
Supervised By:
Mr. Waseem

Course:
CSC-371 Artificial Intelligence

Department of Computer Science

University of Engineering and Technology
Lahore Pakistan
2

Contents
Week # 9....................................................................................................................... 4
Question No. 1.............................................................................................................. 4
1.1. Importing Libraries:.......................................................................................... 4
1.2. Load the Dataset:.............................................................................................4
1.3. Data Preprocessing:.........................................................................................5
1.4. Model Initialization:...........................................................................................7
1.5. Model Prediction:............................................................................................. 7
1.6. Visualizing Results:.......................................................................................... 8
1.7. Evaluating the Model:.......................................................................................8
1.8. Final Output:.....................................................................................................9
1.9. Full Code:.......................................................................................................10
1.10. Model Scores:............................................................................................. 11
Question No. 2............................................................................................................ 13
2.1. Importing Libraries:............................................................................................13
2.2. Load Dataset:.................................................................................................... 13
2.3. Data Preprocessing:..........................................................................................14
2.4. Model Initialization:............................................................................................14
2.5. Model Training:...............................................................................................14
2.6. Model Prediction:...............................................................................................15
2.7. Visualizing Results:........................................................................................... 15
2.8. Model Evaluation:..............................................................................................16
2.9. Final Output:......................................................................................................17
2.10. Full code:........................................................................................................17
Question No. 03.......................................................................................................... 19
3.1. Evaluation Metrics for Classification Model:......................................................19
(a) Accuracy............................................................................................................ 19
(b) Precision............................................................................................................ 19
(c) Recall (Sensitivity or True Positive Rate)........................................................... 19
(d) F1-Score............................................................................................................ 19

2023-CS-156
AI Tasks week#9
3

(e) Confusion Matrix................................................................................................ 20

(f) ROC Curve & AUC (Area Under Curve)............................................................. 20
3.2. Evaluation Metrics for Regression Models........................................................ 20
(a) Mean Absolute Error (MAE)...............................................................................20
(b) Mean Squared Error (MSE)............................................................................... 20
(c) Root Mean Squared Error (RMSE).................................................................... 20
(d) R-squared (R2R^2 Score)..................................................................................20
3.3. Evaluation Metrics for Clustering Models.......................................................... 21
(a) Silhouette Score.................................................................................................21
(b) Davies-Bouldin Index......................................................................................... 21
(c) Dunn Index.........................................................................................................21
Conclusion...............................................................................................................21
Question No. 04..............................................................................................................22
4. Data Preprocessing Techniques........................................................................... 22
4.1. Handling Missing or Null Values.....................................................................22
4.2. Handling Duplicate Data................................................................................ 23
4.3. Handling Outliers............................................................................................23
4.4. Encoding Categorical Data............................................................................ 23
4.5. Feature Scaling (Normalization & Standardization)....................................... 24
4.6. Feature Selection & Engineering................................................................... 24
4.7. Splitting Data into Training and Testing Sets..................................................25
Final Summary Table.................................................................................................. 25

2023-CS-156
AI Tasks week#9
4

Week # 9
Question No. 1
Multiple Linear Regression
1.1. Importing Libraries:
First of all libraries are imported. Import
required Python libraries for handling data (pandas, numpy),
visualization (matplotlib, seaborn), and machine learning
(sklearn).
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,
r2_score
No output; libraries are just being imported.

1.2. Load the Dataset:

Load the dataset from a CSV file and display
the first few rows.
Code:
2023-CS-156
AI Tasks week#9
5

df = pd.read_csv("student_scores.csv") # Update with correct file

name
print(df.head()) # Display first few rows
A table will be showing the rows of the dataset, like:
Hours Previous Extracurricular Final Score
Studied Scores
5.1 78 2 82
6.3 85 3 88
4.8 75 1 80
7.2 90 4 92
5.5 79 2 83

1.3. Data Preprocessing:

1.3.1. Checking For Missing Values:
Ensure that there are no missing values in the
dataset.
CODE:
print(df.isnull().sum())
A summary of missing values per column, typically:

Hours_Studied 0
Previous_Scores 0
Extracurricular 0
Final_Score 0

2023-CS-156
AI Tasks week#9
6

dtype: int64

If there were missing values, we would handle them using

methods like .fillna() or .dropna().

1.3.2. Checking For Data type:

Identify the data types of each
column to ensure compatibility with the machine learning model.
Code:
print(df.dtypes)

Hours_Studied float64
Previous_Scores int64
Extracurricular int64
Final_Score int64
dtype: object

If data types are incorrect, we may need to convert them using

.astype().

1.3.3. Select Features and Target Variable:

● X: Contains independent variables (features).
● y: Contains the dependent variable (Final_Score).
Code:
X = df[['Hours_Studied', 'Previous_Scores', 'Extracurricular']]
2023-CS-156
AI Tasks week#9
7

y = df['Final_Score']
No direct output, but X and y now store data for model training.

1.3.4. Split Data into Training and Testing Sets:

Splits data into 80% training data and 20% testing data.

Code:
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

No direct output, but now:

● X_train, y_train → Used for training

● X_test, y_test → Used for testing

1.4. Model Initialization:

Initialize a Linear Regression model and train it on
X_train and y_train.
Code:
model = LinearRegression()
model.fit(X_train, y_train)
No direct output, but the model has now learned the relationship
between features and Final_Score.

2023-CS-156
AI Tasks week#9
8

1.5. Model Prediction:

Use the trained model to make predictions on the
test
set.
Code:
y_pred = model.predict(X_test)
No direct output, but y_pred contains predicted final scores for
X_test.

1.6. Visualizing Results:

● Compares actual vs. predicted values in a scatter plot.
● Ideally, the points should be close to a straight diagonal line.
Code:
plt.scatter(y_test, y_pred, color='blue')
plt.xlabel("Actual Scores")
plt.ylabel("Predicted Scores")
plt.title("Actual vs Predicted Scores")
plt.show()

A scatter plot showing how well the model's predictions match the
actual scores.

1.7. Evaluating the Model:

Evaluate model performance using:

2023-CS-156
AI Tasks week#9
9

● Mean Squared Error (MSE): Measures error magnitude (lower is

better).
● R² Score: Measures goodness of fit (closer to 1 is better)

Code:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")

print(f"R-squared Score: {r2}")

1.8. Final Output:

2023-CS-156
AI Tasks week#9
10

1.9. Full Code:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load dataset

df = pd.read_csv("Student_Performance.csv") # Ensure the correct file name

print("First few rows of the dataset:")

print(df.head()) # Display first few rows

# Check for missing values

print("\nMissing values in the dataset:")

print(df.isnull().sum())

# Check data types

print("\nData types of each column:")

print(df.dtypes)

# Convert categorical data to numerical (if needed)

df['Extracurricular Activities'] = df['Extracurricular Activities'].map({'Yes':

1, 'No': 0})

# Selecting features (independent variables) and target variable

X = df[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 'Sleep

Hours', 'Sample Question Papers Practiced']]

2023-CS-156
AI Tasks week#9
11

y = df['Performance Index']

# Split dataset into training (80%) and testing (20%) sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

# Initialize and train the model

model = LinearRegression()

model.fit(X_train, y_train)

# Predict on test data

y_pred = model.predict(X_test)

# Visualizing results: Actual vs Predicted scores

plt.scatter(y_test, y_pred, color='blue')

plt.xlabel("Actual Performance Index")

plt.ylabel("Predicted Performance Index")

plt.title("Actual vs Predicted Performance Index")

plt.show()

# Model evaluation

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"\nMean Squared Error: {mse}")

print(f"R-squared Score: {r2}")

1.10. Model Scores:

PS E:\study\semester 4\AI\week 9> python task1.py
2023-CS-156
AI Tasks week#9
12

First few rows of the dataset:

Hours Studied Previous Scores Extracurricular Activities Sleep Hours Sample

Question Papers Practiced Performance Index

0 7 99 Yes 9 1 91.0

1 4 82 No 4 2 65.0

2 8 51 Yes 7 2 45.0

3 5 52 Yes 5 2 36.0

4 7 75 No 8 5 66.0

Missing values in the dataset:

Hours Studied 0

Previous Scores 0

Extracurricular Activities 0

Sleep Hours 0

Sample Question Papers Practiced 0

Performance Index 0

dtype: int64

Hours Studied int64

Previous Scores int64

Extracurricular Activities object

Sleep Hours int64

Sample Question Papers Practiced int64

Performance Index float64

dtype: object

2023-CS-156
AI Tasks week#9
13

Mean Squared Error: 4.082628398521853

R-squared Score: 0.9889832909573145

Question No. 2
2. Logistic Regression
2.1. Importing Libraries:
● Essential Python libraries for data handling, visualization, and machine learning
are imported.

Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score

Output:
No direct output, but the libraries are loaded successfully.

2.2. Load Dataset:

● Reads the dataset into a Pandas DataFrame and displays the first few rows.

Code:
df = pd.read_csv("student_logistic_regression.csv") # Update with actual file
name
print(df.head()) # Display first few rows

Output:
A table displaying the first five rows of the dataset, similar to:
User ID Gender Age EstimatedSalary Purchased
15624510 Male 19 19000 0

15810944 Male 35 20000 0

2023-CS-156
AI Tasks week#9
14

User ID Gender Age EstimatedSalary Purchased

15668575 Female 26 43000 0

15603246 Female 27 57000 0

15804002 Male 19 76000 0

2.3. Data Preprocessing:

● Drops unnecessary columns and converts categorical data into numerical values.
● Scales the numerical features for better model performance.

Code:
df = df.drop(columns=["User ID"]) # Remove irrelevant column
df = pd.get_dummies(df, columns=["Gender"], drop_first=True) # Convert Gender
to numeric
Output:
No direct output, but "User ID" is removed, and "Gender" is converted into a numeric column
(e.g., Gender_Male = 1 for Male, 0 for Female).

2.4. Model Initialization:

● Defines features (independent variables) and the target variable.
● Splits data into training and testing sets (80% training, 20% testing).

Code:
X = df.drop("Purchased", axis=1) # Features
y = df["Purchased"] # Target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)
Output:
No direct output, but the dataset is split into training and testing sets.

2.5. Model Training:

● Standardizes feature values for improved performance.
● Initializes and trains the Logistic Regression model.

2023-CS-156
AI Tasks week#9
15

Code:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression()
model.fit(X_train, y_train)
Output:
No direct output, but the Logistic Regression model is successfully trained on the dataset.

2.6. Model Prediction:

● Predicts outcomes using the trained model on the test dataset.

Code:
y_pred = model.predict(X_test)
Output:
No direct output, but y_pred contains the predicted values.

2.7. Visualizing Results:

● Compares actual vs. predicted values in a confusion matrix.
● The confusion matrix shows how well the model classified the test data.

Code:
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
Output:
A heatmap of the confusion matrix similar to:

Actual \ 0 1
Predicted
0 (No Purchase) 18 2

1 (Purchase) 3 7

2023-CS-156
AI Tasks week#9
16

2.8. Model Evaluation:

● Displays accuracy, precision, recall, and F1-score of the model.

Code:
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Accuracy Score:", accuracy_score(y_test, y_pred))
Output:
A classification report and accuracy score, for example:
Classification Report:
precision recall f1-score support

0 0.86 0.90 0.88 20

1 0.78 0.70 0.74 10

accuracy 0.83 30
macro avg 0.82 0.80 0.81 30
weighted avg 0.83 0.83 0.83 30

Accuracy Score: 0.83

2023-CS-156
AI Tasks week#9
17

2.9. Final Output:

2.10. Full code:

3. # Importing Libraries
4. import pandas as pd
5. import numpy as np
6. import matplotlib.pyplot as plt
7. import seaborn as sns
8. from sklearn.model_selection import train_test_split
9. from sklearn.linear_model import LogisticRegression
10. from sklearn.preprocessing import StandardScaler
11. from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score
12.

2023-CS-156
AI Tasks week#9
18

13. # Load Dataset

14. df = pd.read_csv("Social_Network_Ads.csv")
15. print(df.head()) # Display first few rows
16.
17. # Data Preprocessing
18. df = df.dropna() # Remove missing values
19. df = pd.get_dummies(df, drop_first=True) # Convert categorical variables to
numeric
20.
21. # Define Features and Target
22. X = df.drop("Purchased", axis=1) # Replace with actual target column name
23. y = df["Purchased"]
24.
25. # Split Data into Training and Testing Sets
26. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
27.
28. # Feature Scaling
29. scaler = StandardScaler()
30. X_train = scaler.fit_transform(X_train)
31. X_test = scaler.transform(X_test)
32.
33. # Model Initialization and Training
34. model = LogisticRegression()
35. model.fit(X_train, y_train)
36.
37. # Model Prediction
38. y_pred = model.predict(X_test)
39.
40. # Evaluating the Model
41. print("Classification Report:\n", classification_report(y_test, y_pred))
42. print("Accuracy Score:", accuracy_score(y_test, y_pred))
43.
44. # Confusion Matrix Visualization
45. cm = confusion_matrix(y_test, y_pred)
46. sns.heatmap(cm, annot=True, fmt='d', cmap="Blues")
47. plt.xlabel("Predicted")
48. plt.ylabel("Actual")
49. plt.title("Confusion Matrix")
50. plt.show()
51.

2023-CS-156
AI Tasks week#9
19

Question No. 03
3. What are the evaluation matrices for a machine learning model?

Evaluation metrics are used to assess the performance of a machine learning model. The choice
of metric depends on the type of problem: classification, regression, or clustering.

3.1. Evaluation Metrics for Classification Model:

These metrics measure how well a model classifies data points into categories.

(a) Accuracy

● Measures the percentage of correctly predicted instances.

● Formula: Accuracy=Correct PredictionsTotal PredictionsAccuracy =
\frac{\text{Correct Predictions}}{\text{Total Predictions}}
● Best for: Balanced datasets (when classes are equally distributed).

(b) Precision

● Measures how many predicted positive instances are actually positive.

● Formula: Precision=True PositivesTrue Positives+False PositivesPrecision =
\frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
● Best for: When false positives need to be minimized (e.g., spam detection).

(c) Recall (Sensitivity or True Positive Rate)

● Measures how many actual positive instances are correctly identified.

● Formula: Recall=True PositivesTrue Positives+False NegativesRecall =
\frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
● Best for: When false negatives need to be minimized (e.g., medical diagnoses).

(d) F1-Score

● Harmonic mean of Precision and Recall, providing a balance.

● Formula: F1=2×Precision×RecallPrecision+RecallF1 = 2 \times
\frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
● Best for: Imbalanced datasets.

2023-CS-156
AI Tasks week#9
20

(e) Confusion Matrix

● A table that summarizes correct and incorrect predictions.

● Example:
Actual \ Positive (1) Negative (0)
Predicted
True Positive False Negative
Positive (1) (TP) (FN)

False Positive True Negative

Negative (0) (FP) (TN)

(f) ROC Curve & AUC (Area Under Curve)

● ROC Curve: Plots True Positive Rate (Recall) vs. False Positive Rate.
● AUC Score: Measures the area under the ROC curve (closer to 1 is better).
3.2. Evaluation Metrics for Regression Models
These metrics evaluate how well a model predicts continuous values.

(a) Mean Absolute Error (MAE)

● Measures the average absolute difference between actual and predicted values.
● Formula: MAE=1n∑i=1n∣yi−yi^∣MAE = \frac{1}{n} \sum_{i=1}^{n} | y_i - \hat{y_i} |
● Best for: When all errors are treated equally.
(b) Mean Squared Error (MSE)

● Measures the average squared difference between actual and predicted values.
● Formula: MSE=1n∑i=1n(yi−yi^)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i -
\hat{y_i})^2
● Best for: Penalizing large errors more.

(c) Root Mean Squared Error (RMSE)

● Square root of MSE, providing an error in the same units as the target variable.
● Formula: RMSE=MSERMSE = \sqrt{MSE}
● Best for: Interpretable errors in real-world scenarios.

2023-CS-156
AI Tasks week#9
21

(d) R-squared (R2R^2 Score)

● Measures how well the model explains variance in the data (0 to 1).
● Formula: R2=1−∑(yi−yi^)2∑(yi−yˉ)2R^2 = 1 - \frac{\sum (y_i - \hat{y_i})^2}{\sum
(y_i - \bar{y})^2}
● Best for: Checking model goodness-of-fit (higher is better).

3.3. Evaluation Metrics for Clustering Models

These metrics evaluate unsupervised learning models like K-Means or DBSCAN.

(a) Silhouette Score

● Measures how well clusters are separated.

● Range: -1 to 1 (higher is better).
(b) Davies-Bouldin Index

● Measures the compactness and separation of clusters.

● Lower values indicate better clustering.
(c) Dunn Index

● Measures the ratio of minimum inter-cluster distance to maximum intra-cluster

distance.
● Higher values indicate better clustering.
Conclusion
Choosing the right evaluation metric depends on the problem type:

● Classification: Accuracy, Precision, Recall, F1-score, Confusion Matrix,

AUC-ROC
● Regression: MAE, MSE, RMSE, R-squared
● Clustering: Silhouette Score, Davies-Bouldin Index

2023-CS-156
AI Tasks week#9
22

2023-CS-156
AI Tasks week#9
23

Question No. 04
What are different methods or techniques used in Data Preprocessing?
(Like how we will handle the missing or null values in a data)

4. Data Preprocessing Techniques

Data preprocessing is a crucial step in machine learning that involves cleaning, transforming, and
preparing raw data for modeling. Below are the key methods used in data preprocessing:

4.1. Handling Missing or Null Values

Missing values in a dataset can cause issues in model training. There are several ways to handle
them:
(a) Removing Missing Values

● Method:
Drop rows or columns that contain missing values.
● Use When:The dataset is large, and missing values are minimal.
● Code Example (Pandas):
● df.dropna(inplace=True) # Removes rows with missing values
● df.drop(columns=['ColumnName'], inplace=True) # Removes a column with
missing values
(b) Imputing Missing Values

● Method: Fill missing values with statistical measures (mean, median, mode) or
interpolation.
● Use When:The dataset is small, and removing data is not ideal.
● Code Example:
● df['ColumnName'].fillna(df['ColumnName'].mean(), inplace=True) # Fill
with mean
● df['ColumnName'].fillna(df['ColumnName'].median(), inplace=True) # Fill
with median
● df['ColumnName'].fillna(df['ColumnName'].mode()[0], inplace=True) # Fill
with mode
(c) Using Machine Learning for Imputation

● Method: Predict missing values using K-Nearest Neighbors (KNN), Linear

Regression, etc.
● Code Example (KNN Imputer):
● from sklearn.impute import KNNImputer
● imputer = KNNImputer(n_neighbors=3)
● df[['Column1', 'Column2']] = imputer.fit_transform(df[['Column1',
'Column2']])

4.2. Handling Duplicate Data

Duplicate data can cause bias in the model.

2023-CS-156
AI Tasks week#9
24

● Remove Duplicates:
● df.drop_duplicates(inplace=True)

4.3. Handling Outliers

Outliers are extreme values that can distort model performance.
(a) Using the IQR (Interquartile Range) Method

● Removes values beyond 1.5 times the IQR range.

● Code Example:
● Q1 = df['ColumnName'].quantile(0.25)
● Q3 = df['ColumnName'].quantile(0.75)
● IQR = Q3 - Q1
● df = df[(df['ColumnName'] >= Q1 - 1.5 * IQR) & (df['ColumnName'] <= Q3 +
1.5 * IQR)]
(b) Using Z-Score Method

● Removes values that are more than a certain number of standard deviations
away.
● Code Example:
● from scipy import stats
● df = df[(np.abs(stats.zscore(df['ColumnName'])) < 3)]

4.4. Encoding Categorical Data

Categorical variables must be converted into numerical form for machine learning.
(a) Label Encoding (For Binary Categories)

● Converts categorical values into 0s and 1s.

● Example:
● from sklearn.preprocessing import LabelEncoder
● encoder = LabelEncoder()
● df['CategoryColumn'] = encoder.fit_transform(df['CategoryColumn'])
(b) One-Hot Encoding (For Multiple Categories)

● Creates separate binary columns for each category.

● Example:
● df = pd.get_dummies(df, columns=['CategoryColumn'], drop_first=True)
(c) Ordinal Encoding (For Ordered Categories)

● Assigns numerical values based on order (e.g., Low = 1, Medium = 2, High = 3).
● Example:
● from sklearn.preprocessing import OrdinalEncoder
● encoder = OrdinalEncoder(categories=[['Low', 'Medium', 'High']])
● df['CategoryColumn'] = encoder.fit_transform(df[['CategoryColumn']])

2023-CS-156
AI Tasks week#9
25

4.5. Feature Scaling (Normalization & Standardization)

Scaling is used to ensure that all features contribute equally to the model.
(a) Min-Max Scaling (Normalization)

● Scales values between 0 and 1.

● Use When:Data has no outliers.
● Example:
● from sklearn.preprocessing import MinMaxScaler
● scaler = MinMaxScaler()
● df[['Column1', 'Column2']] = scaler.fit_transform(df[['Column1',
'Column2']])
(b) Standardization (Z-Score Normalization)

● Scales values to have mean = 0 and standard deviation = 1.

● Use When:Data has outliers.
● Example:
● from sklearn.preprocessing import StandardScaler
● scaler = StandardScaler()
● df[['Column1', 'Column2']] = scaler.fit_transform(df[['Column1',
'Column2']])

4.6. Feature Selection & Engineering

Selecting the most important features improves model performance.
(a) Removing Irrelevant Features

● Drop unnecessary columns.

● df.drop(columns=['UnnecessaryColumn'], inplace=True)
(b) Using Correlation Matrix

● Removes features that are highly correlated (multicollinearity).

● Example:
● import seaborn as sns
● import matplotlib.pyplot as plt
● plt.figure(figsize=(10, 8))
● sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
● plt.show()
(c) Using Feature Importance (Random Forest)

● Identifies the most important features.

● Example:
● from sklearn.ensemble import RandomForestClassifier
● model = RandomForestClassifier()
● model.fit(X, y)

2023-CS-156
AI Tasks week#9
26

● feature_importances = pd.Series(model.feature_importances_,
index=X.columns)
● feature_importances.nlargest(5).plot(kind='barh')

4.7. Splitting Data into Training and Testing Sets

To evaluate a model, data should be split into training and testing sets.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Final Summary Table

Preprocessing Step Methods
Handling Missing Values Drop missing values, Fill with mean/median/mode, KNN
Imputer

Handling Duplicates df.drop_duplicates()

Handling Outliers IQR method, Z-score method

Encoding Categorical Label Encoding, One-Hot Encoding, Ordinal Encoding

Data
Feature Scaling Min-Max Scaling, Standardization

Feature Selection Correlation matrix, Feature importance (Random Forest)

Data Splitting train_test_split() for training/testing

2023-CS-156
AI Tasks week#9

Predictive Modeling Projectt
No ratings yet
Predictive Modeling Projectt
109 pages
Law Nov 21 Notes by CA Kevin Haria For Updates, Join Channel @KEVINHARIA
No ratings yet
Law Nov 21 Notes by CA Kevin Haria For Updates, Join Channel @KEVINHARIA
349 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
Assigniment 2 Machine Learning
No ratings yet
Assigniment 2 Machine Learning
7 pages
Aquif Ibrar 1212
No ratings yet
Aquif Ibrar 1212
9 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
MACHINE LEARNING QUESTION BANK (M1 M2 M3)
No ratings yet
MACHINE LEARNING QUESTION BANK (M1 M2 M3)
16 pages
Jupyter Lab
No ratings yet
Jupyter Lab
42 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
59 pages
EE331_2024F_assignment1_v2
No ratings yet
EE331_2024F_assignment1_v2
5 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
DSA Week1
No ratings yet
DSA Week1
60 pages
hw1 Problem Set
No ratings yet
hw1 Problem Set
8 pages
hadith (3)
No ratings yet
hadith (3)
52 pages
Jes2 Spool
No ratings yet
Jes2 Spool
45 pages
Comparison of UPVC From Wooden and Aluminium
No ratings yet
Comparison of UPVC From Wooden and Aluminium
2 pages
FinalExam_AI_solution (1)
No ratings yet
FinalExam_AI_solution (1)
4 pages
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
No ratings yet
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
8 pages
Valemount Directory Prospect List
No ratings yet
Valemount Directory Prospect List
45 pages
Tyranid Datasheet Cards 10th v1.5
No ratings yet
Tyranid Datasheet Cards 10th v1.5
67 pages
Load Dataset: Import As
No ratings yet
Load Dataset: Import As
8 pages
Business Report PM Suchita Bhovar March 10 2024
No ratings yet
Business Report PM Suchita Bhovar March 10 2024
27 pages
Indicator SMC Structures and FVG for TradingView
No ratings yet
Indicator SMC Structures and FVG for TradingView
9 pages
Lawo Plugin Collection Operators Manual V1.0 - 4
No ratings yet
Lawo Plugin Collection Operators Manual V1.0 - 4
53 pages
ML Lab Manual (Upto Cie-1)
No ratings yet
ML Lab Manual (Upto Cie-1)
33 pages
hw1
No ratings yet
hw1
12 pages
Ps 4
No ratings yet
Ps 4
12 pages
Data Science Bootcamp (Day-01) (1) - Compressed
No ratings yet
Data Science Bootcamp (Day-01) (1) - Compressed
161 pages
compressed_presentation
No ratings yet
compressed_presentation
33 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Ml Lab Manual 2024
No ratings yet
Ml Lab Manual 2024
41 pages
a6337dec-a91f-47e2-9fcc-74e0af13ce5e
No ratings yet
a6337dec-a91f-47e2-9fcc-74e0af13ce5e
28 pages
Lec1-SDLC
No ratings yet
Lec1-SDLC
27 pages
[1122]AI_Assignment2
No ratings yet
[1122]AI_Assignment2
2 pages
Lec6 Erd Dfd
No ratings yet
Lec6 Erd Dfd
25 pages
lab manual
No ratings yet
lab manual
80 pages
MLCyberLab
No ratings yet
MLCyberLab
9 pages
S-9
No ratings yet
S-9
18 pages
Lec10 Integration&SystemTesting
No ratings yet
Lec10 Integration&SystemTesting
22 pages
IED Torino Undergraduate Transportation Design
No ratings yet
IED Torino Undergraduate Transportation Design
19 pages
Research AP Example
No ratings yet
Research AP Example
23 pages
Yamaha Tracer GT Owner's Manual
No ratings yet
Yamaha Tracer GT Owner's Manual
130 pages
Bronstein V Latin School of Chicago EXHIBITS To Omnibus Opposition To Motions To Dismiss
No ratings yet
Bronstein V Latin School of Chicago EXHIBITS To Omnibus Opposition To Motions To Dismiss
134 pages
ml file syllabus
No ratings yet
ml file syllabus
43 pages
1. Unit 1,2 MCQs set
No ratings yet
1. Unit 1,2 MCQs set
29 pages
PPPL final practical questions
No ratings yet
PPPL final practical questions
5 pages
Final Project Documentation
No ratings yet
Final Project Documentation
20 pages
ML Lab 04 Manual - Pandas and MatplotLib
No ratings yet
ML Lab 04 Manual - Pandas and MatplotLib
7 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
AIML Lab
No ratings yet
AIML Lab
48 pages
Lab Assignment - SVM - 2024
No ratings yet
Lab Assignment - SVM - 2024
5 pages
Assignment 01
No ratings yet
Assignment 01
7 pages
Linux Commands
No ratings yet
Linux Commands
33 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Mongo DB
No ratings yet
Mongo DB
26 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
ML MANUAL
No ratings yet
ML MANUAL
21 pages
ML[1]
No ratings yet
ML[1]
49 pages
Fermented Foods Guideline - 3.3 Kimchi
No ratings yet
Fermented Foods Guideline - 3.3 Kimchi
20 pages
PPT-Slides_TCP-IP Suit_Transport Layer Protocols (part-1)
No ratings yet
PPT-Slides_TCP-IP Suit_Transport Layer Protocols (part-1)
15 pages
Lec3 Agile Copy
No ratings yet
Lec3 Agile Copy
15 pages
CETM313 - Workshop Week 06-4
No ratings yet
CETM313 - Workshop Week 06-4
9 pages
Unit1_topic123_FRQ
No ratings yet
Unit1_topic123_FRQ
15 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Hrm Final Interview Ppt
No ratings yet
Hrm Final Interview Ppt
14 pages
Ml Cyber Lab
No ratings yet
Ml Cyber Lab
16 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
Lec2-RAD - Copy
No ratings yet
Lec2-RAD - Copy
20 pages
Determinants Lecture
No ratings yet
Determinants Lecture
11 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Ph.D. Program in Political Science of The City University of New York
No ratings yet
Ph.D. Program in Political Science of The City University of New York
22 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
Computational
No ratings yet
Computational
7 pages
Important Questions
No ratings yet
Important Questions
4 pages
TIME CIRCUITS Version MAR 2019: Enter A Date To Destination Time
No ratings yet
TIME CIRCUITS Version MAR 2019: Enter A Date To Destination Time
5 pages
S-1
No ratings yet
S-1
5 pages
DataBase Assignment
No ratings yet
DataBase Assignment
14 pages
Untitled 10
No ratings yet
Untitled 10
12 pages
Reliance Jio MP Section 2
No ratings yet
Reliance Jio MP Section 2
4 pages
data science
No ratings yet
data science
10 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
Term Paper On Wto
100% (1)
Term Paper On Wto
4 pages
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
SampleQuestion- AIOL 2024
No ratings yet
SampleQuestion- AIOL 2024
5 pages
Sample Nonprofit Financial Statement
No ratings yet
Sample Nonprofit Financial Statement
4 pages
My 2048 Game Project Employs Various Data Structures and Algorithms to Manage the Game State, Handle User Input, And Perform the Game Logic
No ratings yet
My 2048 Game Project Employs Various Data Structures and Algorithms to Manage the Game State, Handle User Input, And Perform the Game Logic
4 pages
Valmiki Tiger Reserves
No ratings yet
Valmiki Tiger Reserves
20 pages
Nosql
No ratings yet
Nosql
8 pages
CONCEPTS IN MACHINE LEARNING-Ktunotes.in
No ratings yet
CONCEPTS IN MACHINE LEARNING-Ktunotes.in
14 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
IEEE Conference Template 1
No ratings yet
IEEE Conference Template 1
5 pages
Indexer - Lab
No ratings yet
Indexer - Lab
4 pages
Python practice questions (1)
No ratings yet
Python practice questions (1)
5 pages
Individual Case Study
No ratings yet
Individual Case Study
13 pages
Floyd Warshall
No ratings yet
Floyd Warshall
6 pages
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
No ratings yet
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
12 pages
Datascience
No ratings yet
Datascience
8 pages
Muhammad Usman ACA-2023 (1)
No ratings yet
Muhammad Usman ACA-2023 (1)
2 pages
Use of The PERI Logo: Guideline PERI Corporate Design
No ratings yet
Use of The PERI Logo: Guideline PERI Corporate Design
1 page
Objectives and Essentials of Business
No ratings yet
Objectives and Essentials of Business
4 pages
2How would you describe your Personality
No ratings yet
2How would you describe your Personality
2 pages
Albert E. Peacock Collegiate: B.Mus - Ed, M.M
No ratings yet
Albert E. Peacock Collegiate: B.Mus - Ed, M.M
1 page
Table 5-8 - DSM-5 Diagnostic Criteria For Panic Disorder
No ratings yet
Table 5-8 - DSM-5 Diagnostic Criteria For Panic Disorder
1 page
Koita SEO
No ratings yet
Koita SEO
2 pages
Corrosion
No ratings yet
Corrosion
6 pages
The Value Proposition Canvas Instruction Manual
60% (5)
The Value Proposition Canvas Instruction Manual
8 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet

manual(2023-CS-156).docx

Uploaded by

manual(2023-CS-156).docx

Uploaded by

​ ​ ​ AI Lab Manual

Department of Computer Science

(e) Confusion Matrix................................................................................................ 20

1.2.​ Load the Dataset:

df = pd.read_csv("student_scores.csv") # Update with correct file

1.3.​ Data Preprocessing:

If there were missing values, we would handle them using

1.3.2.​ Checking For Data type:

If data types are incorrect, we may need to convert them using

1.3.3.​ Select Features and Target Variable:

1.3.4.​ Split Data into Training and Testing Sets:

No direct output, but now:

●​ X_train, y_train → Used for training

1.4.​ Model Initialization:

1.5.​ Model Prediction:

1.6.​ Visualizing Results:

1.7.​ Evaluating the Model:

●​ Mean Squared Error (MSE): Measures error magnitude (lower is

print(f"Mean Squared Error: {mse}")

1.8.​ Final Output:

1.9.​ Full Code:

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

df = pd.read_csv("Student_Performance.csv") # Ensure the correct file name

print("First few rows of the dataset:")

print(df.head()) # Display first few rows

# Check for missing values

print("\nMissing values in the dataset:")

# Check data types

print("\nData types of each column:")

# Convert categorical data to numerical (if needed)

df['Extracurricular Activities'] = df['Extracurricular Activities'].map({'Yes':

# Selecting features (independent variables) and target variable

X = df[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 'Sleep

# Split dataset into training (80%) and testing (20%) sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

# Initialize and train the model

# Predict on test data

# Visualizing results: Actual vs Predicted scores

plt.scatter(y_test, y_pred, color='blue')

plt.xlabel("Actual Performance Index")

plt.ylabel("Predicted Performance Index")

plt.title("Actual vs Predicted Performance Index")

mse = mean_squared_error(y_test, y_pred)

print(f"\nMean Squared Error: {mse}")

print(f"R-squared Score: {r2}")

1.10.​ Model Scores:

First few rows of the dataset:

Hours Studied Previous Scores Extracurricular Activities Sleep Hours Sample

Missing values in the dataset:

Sample Question Papers Practiced 0

Hours Studied int64

Previous Scores int64

Extracurricular Activities object

Sleep Hours int64

Sample Question Papers Practiced int64

Performance Index float64

Mean Squared Error: 4.082628398521853

R-squared Score: 0.9889832909573145

2.2.​ Load Dataset:

15810944 Male 35 20000 0

User ID Gender Age EstimatedSalary Purchased

15603246 Female 27 57000 0

15804002 Male 19 76000 0

2.3.​ Data Preprocessing:

2.4.​ Model Initialization:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

2.5.​ Model Training:

2.6.​ Model Prediction:

2.7.​ Visualizing Results:

2.8.​ Model Evaluation:

0 0.86 0.90 0.88 20

Accuracy Score: 0.83

AI Lab Manual

1.2. Load the Dataset:

1.3. Data Preprocessing:

1.3.2. Checking For Data type:

1.3.3. Select Features and Target Variable:

1.3.4. Split Data into Training and Testing Sets:

● X_train, y_train → Used for training

1.4. Model Initialization:

1.5. Model Prediction:

1.6. Visualizing Results:

1.7. Evaluating the Model:

● Mean Squared Error (MSE): Measures error magnitude (lower is

1.8. Final Output:

1.9. Full Code:

1.10. Model Scores:

2.2. Load Dataset:

2.3. Data Preprocessing:

2.4. Model Initialization:

2.5. Model Training:

2.6. Model Prediction:

2.7. Visualizing Results:

2.8. Model Evaluation:

2.9. Final Output:

2.10. Full code:

13. # Load Dataset

3.1. Evaluation Metrics for Classification Model:

● Measures the percentage of correctly predicted instances.

● Measures how many predicted positive instances are actually positive.

● Measures how many actual positive instances are correctly identified.

● Harmonic mean of Precision and Recall, providing a balance.

● A table that summarizes correct and incorrect predictions.

3.3. Evaluation Metrics for Clustering Models

● Measures how well clusters are separated.

● Measures the compactness and separation of clusters.

● Measures the ratio of minimum inter-cluster distance to maximum intra-cluster

● Classification: Accuracy, Precision, Recall, F1-score, Confusion Matrix,

4. Data Preprocessing Techniques

4.1. Handling Missing or Null Values

● Method: Predict missing values using K-Nearest Neighbors (KNN), Linear

4.2. Handling Duplicate Data

4.3. Handling Outliers

● Removes values beyond 1.5 times the IQR range.

4.4. Encoding Categorical Data

● Converts categorical values into 0s and 1s.

● Creates separate binary columns for each category.

4.5. Feature Scaling (Normalization & Standardization)

● Scales values between 0 and 1.

● Scales values to have mean = 0 and standard deviation = 1.

4.6. Feature Selection & Engineering

● Drop unnecessary columns.

● Removes features that are highly correlated (multicollinearity).

● Identifies the most important features.

4.7. Splitting Data into Training and Testing Sets