manual(2023-CS-156).docx
manual(2023-CS-156).docx
Session 2023-2027
Submitted By:
2023-CS-156 Khola Raouf
Supervised By:
Mr. Waseem
Course:
CSC-371 Artificial Intelligence
Contents
Week # 9....................................................................................................................... 4
Question No. 1.............................................................................................................. 4
1.1. Importing Libraries:.......................................................................................... 4
1.2. Load the Dataset:.............................................................................................4
1.3. Data Preprocessing:.........................................................................................5
1.4. Model Initialization:...........................................................................................7
1.5. Model Prediction:............................................................................................. 7
1.6. Visualizing Results:.......................................................................................... 8
1.7. Evaluating the Model:.......................................................................................8
1.8. Final Output:.....................................................................................................9
1.9. Full Code:.......................................................................................................10
1.10. Model Scores:............................................................................................. 11
Question No. 2............................................................................................................ 13
2.1. Importing Libraries:............................................................................................13
2.2. Load Dataset:.................................................................................................... 13
2.3. Data Preprocessing:..........................................................................................14
2.4. Model Initialization:............................................................................................14
2.5. Model Training:...............................................................................................14
2.6. Model Prediction:...............................................................................................15
2.7. Visualizing Results:........................................................................................... 15
2.8. Model Evaluation:..............................................................................................16
2.9. Final Output:......................................................................................................17
2.10. Full code:........................................................................................................17
Question No. 03.......................................................................................................... 19
3.1. Evaluation Metrics for Classification Model:......................................................19
(a) Accuracy............................................................................................................ 19
(b) Precision............................................................................................................ 19
(c) Recall (Sensitivity or True Positive Rate)........................................................... 19
(d) F1-Score............................................................................................................ 19
2023-CS-156
AI Tasks week#9
3
2023-CS-156
AI Tasks week#9
4
Week # 9
Question No. 1
Multiple Linear Regression
1.1. Importing Libraries:
First of all libraries are imported. Import
required Python libraries for handling data (pandas, numpy),
visualization (matplotlib, seaborn), and machine learning
(sklearn).
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,
r2_score
No output; libraries are just being imported.
Hours_Studied 0
Previous_Scores 0
Extracurricular 0
Final_Score 0
2023-CS-156
AI Tasks week#9
6
dtype: int64
Hours_Studied float64
Previous_Scores int64
Extracurricular int64
Final_Score int64
dtype: object
y = df['Final_Score']
No direct output, but X and y now store data for model training.
Code:
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
2023-CS-156
AI Tasks week#9
8
A scatter plot showing how well the model's predictions match the
actual scores.
2023-CS-156
AI Tasks week#9
9
Code:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
2023-CS-156
AI Tasks week#9
10
import numpy as np
# Load dataset
print(df.isnull().sum())
print(df.dtypes)
2023-CS-156
AI Tasks week#9
11
y = df['Performance Index']
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
plt.show()
# Model evaluation
r2 = r2_score(y_test, y_pred)
0 7 99 Yes 9 1 91.0
1 4 82 No 4 2 65.0
2 8 51 Yes 7 2 45.0
3 5 52 Yes 5 2 36.0
4 7 75 No 8 5 66.0
Hours Studied 0
Previous Scores 0
Extracurricular Activities 0
Sleep Hours 0
Performance Index 0
dtype: int64
dtype: object
2023-CS-156
AI Tasks week#9
13
Question No. 2
2. Logistic Regression
2.1. Importing Libraries:
● Essential Python libraries for data handling, visualization, and machine learning
are imported.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, classification_report,
accuracy_score
Output:
No direct output, but the libraries are loaded successfully.
Code:
df = pd.read_csv("student_logistic_regression.csv") # Update with actual file
name
print(df.head()) # Display first few rows
Output:
A table displaying the first five rows of the dataset, similar to:
User ID Gender Age EstimatedSalary Purchased
15624510 Male 19 19000 0
2023-CS-156
AI Tasks week#9
14
Code:
df = df.drop(columns=["User ID"]) # Remove irrelevant column
df = pd.get_dummies(df, columns=["Gender"], drop_first=True) # Convert Gender
to numeric
Output:
No direct output, but "User ID" is removed, and "Gender" is converted into a numeric column
(e.g., Gender_Male = 1 for Male, 0 for Female).
Code:
X = df.drop("Purchased", axis=1) # Features
y = df["Purchased"] # Target variable
2023-CS-156
AI Tasks week#9
15
Code:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
model = LogisticRegression()
model.fit(X_train, y_train)
Output:
No direct output, but the Logistic Regression model is successfully trained on the dataset.
Code:
y_pred = model.predict(X_test)
Output:
No direct output, but y_pred contains the predicted values.
Code:
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
Output:
A heatmap of the confusion matrix similar to:
Actual \ 0 1
Predicted
0 (No Purchase) 18 2
1 (Purchase) 3 7
2023-CS-156
AI Tasks week#9
16
Code:
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Accuracy Score:", accuracy_score(y_test, y_pred))
Output:
A classification report and accuracy score, for example:
Classification Report:
precision recall f1-score support
accuracy 0.83 30
macro avg 0.82 0.80 0.81 30
weighted avg 0.83 0.83 0.83 30
2023-CS-156
AI Tasks week#9
17
2023-CS-156
AI Tasks week#9
18
2023-CS-156
AI Tasks week#9
19
Question No. 03
3. What are the evaluation matrices for a machine learning model?
Evaluation metrics are used to assess the performance of a machine learning model. The choice
of metric depends on the type of problem: classification, regression, or clustering.
(a) Accuracy
(b) Precision
(d) F1-Score
2023-CS-156
AI Tasks week#9
20
● ROC Curve: Plots True Positive Rate (Recall) vs. False Positive Rate.
● AUC Score: Measures the area under the ROC curve (closer to 1 is better).
3.2. Evaluation Metrics for Regression Models
These metrics evaluate how well a model predicts continuous values.
● Measures the average absolute difference between actual and predicted values.
● Formula: MAE=1n∑i=1n∣yi−yi^∣MAE = \frac{1}{n} \sum_{i=1}^{n} | y_i - \hat{y_i} |
● Best for: When all errors are treated equally.
(b) Mean Squared Error (MSE)
● Measures the average squared difference between actual and predicted values.
● Formula: MSE=1n∑i=1n(yi−yi^)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i -
\hat{y_i})^2
● Best for: Penalizing large errors more.
● Square root of MSE, providing an error in the same units as the target variable.
● Formula: RMSE=MSERMSE = \sqrt{MSE}
● Best for: Interpretable errors in real-world scenarios.
2023-CS-156
AI Tasks week#9
21
● Measures how well the model explains variance in the data (0 to 1).
● Formula: R2=1−∑(yi−yi^)2∑(yi−yˉ)2R^2 = 1 - \frac{\sum (y_i - \hat{y_i})^2}{\sum
(y_i - \bar{y})^2}
● Best for: Checking model goodness-of-fit (higher is better).
2023-CS-156
AI Tasks week#9
22
2023-CS-156
AI Tasks week#9
23
Question No. 04
What are different methods or techniques used in Data Preprocessing?
(Like how we will handle the missing or null values in a data)
● Method:
Drop rows or columns that contain missing values.
● Use When:The dataset is large, and missing values are minimal.
● Code Example (Pandas):
● df.dropna(inplace=True) # Removes rows with missing values
● df.drop(columns=['ColumnName'], inplace=True) # Removes a column with
missing values
(b) Imputing Missing Values
● Method: Fill missing values with statistical measures (mean, median, mode) or
interpolation.
● Use When:The dataset is small, and removing data is not ideal.
● Code Example:
● df['ColumnName'].fillna(df['ColumnName'].mean(), inplace=True) # Fill
with mean
● df['ColumnName'].fillna(df['ColumnName'].median(), inplace=True) # Fill
with median
● df['ColumnName'].fillna(df['ColumnName'].mode()[0], inplace=True) # Fill
with mode
(c) Using Machine Learning for Imputation
2023-CS-156
AI Tasks week#9
24
● Remove Duplicates:
● df.drop_duplicates(inplace=True)
● Removes values that are more than a certain number of standard deviations
away.
● Code Example:
● from scipy import stats
● df = df[(np.abs(stats.zscore(df['ColumnName'])) < 3)]
● Assigns numerical values based on order (e.g., Low = 1, Medium = 2, High = 3).
● Example:
● from sklearn.preprocessing import OrdinalEncoder
● encoder = OrdinalEncoder(categories=[['Low', 'Medium', 'High']])
● df['CategoryColumn'] = encoder.fit_transform(df[['CategoryColumn']])
2023-CS-156
AI Tasks week#9
25
2023-CS-156
AI Tasks week#9
26
● feature_importances = pd.Series(model.feature_importances_,
index=X.columns)
● feature_importances.nlargest(5).plot(kind='barh')
2023-CS-156
AI Tasks week#9