Question 1 The Given Dataset Can Be Visualized As Follows

kyfkyfx

Uploaded by

Anuj Vijay Patil (B22EE010)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

51 views13 pages

Question 1 The Given Dataset Can Be Visualized As Follows

kyfkyfx

Uploaded by

Anuj Vijay Patil (B22EE010)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 13

Question 1 Task1Dealing with Missing values Ist ty roping ne unacecsry column athe cabs fstoe hsv ghnumber ot msing data Outliers Here | have use box plot to Identify the outliers forall the important features. Aewe can se from the box plot 1g: The plots show some dat values above 60 Thisieinaestve oF erly people traveling inthe shiprather than outers. Hen, + Incase of P cass theres no ogc of checking for our as ony contains thee values 2 and 2 «The Fre feature seems to have few outers but stilt we can ignore thom a they are smalin umberCategorical Encoding ‘As we can see that there are some features that are in object format , these features must be converted to number format using some logic of character encoding in order to use them for training and test the modet + Using “dataset nfo" we found out that Sex and Embarked are the features which have object data-types + encode.embarted0 is function which Isuse to convert S,C,Qt00,1,2 respectively «+ Similarly Male and Female are converted to binary 0 and 1 repectively Visualizing feature-target dependence Strip Plot is used to analyze the distribution of each feature with respect to the target variable + *Here, we can observe that the data in the features Pelass, Sex, and Embarked isnot evenly distributed; rather, itis concentrated at certain values. However, there Is some level of distribution observed in the case of Age and Fare.” + For test-train split sci learn brary fs used, the distribution is Biven as Follows:Decision Tree Implementation Classes Node Class: * Purpose: Represents a node in the decision tree, storing information about decision and leaf nodes. Attributes: * featureindex splitting. * threshold: Threshold value for the feature. * left: Left child node. * right: Right child node. * infoGain: Information gain at the node. * value: Value for leaf nodes. Index of the feature used for DecisionTreeClassifier Class: Purpose: Implements a decision tree classifier. Attributes: * root: The root node of the decision tree. * maxDepth: Maximum depth of the tree (stopping condition). Methods: * buildTree(dataset, currDepth) * contoCat(dataset, numSamples, numFeatures) * split(dataset, featureindex, threshold) * informationGain(parent, leftChild, rightChild) * entropyly) calculateLeafValue(Y) printTree(tree, indent, featureNames) it(X, Y) infer(x) * makePrediction(x, tree) * calculate_accuracy(actual_labels, predicted_labels)Important Functions wTocat Entropy tntropy FunctionModel Training-Testing-Validation OR ee CL EMCI MaxDepth: 2, Accuracy: @.74719101123 + Best value for hyperparameter is being calculated using [aaa aa aR oe ageea bees the validation dataset of size 10 where each iteration abel id eed eens Accuracy: @.82622471910112: changes the max depth and used to calculate the (ees Opener pC accuracy. 8033707865168539 + Finally the model s tested on the Test data set of size eee ae eer eta Set 20. Confusion Matrix ‘The confusion_matrix function calculates the confusion matrix, a table used in classification to evaluate the performance of a classification algorithm. Here's brief explanation of the logic: Input Parameters: «+ y.true: True labels of the test set. + y pred: Predicted labels by the classifier. Initialization: ‘+ unique classes are extracted from the concatenation of true and predicted labels (unique_classes) + confusion matrix conf matrix) is Initializes with zeros, where rows and columns correspond to Unique classes Filling the Confusion Matrix: erate tvough each par of tre and predicted labels [147 5] 5 ldenty te neces ore and precedes inthe unique asses ary. * Inrements te corresponding cain te confusion mati eeu pl ouput ‘+ Returns the filled confusion matriPrecision, recall, and F1 score Precision, recall, and Fl score are metrics commonly used to evaluate the performance of a classification model, especially in binary classification tasks. Precision: Precision sa measure ofthe accuracy of positive predictions made by the model, Formula: Precision = True Positives / (True Positives + False Positives) Recall (Sensitivity or True Positive Rate): Formula: Recall = True Positives / (True Positives + False Negatives) Precision: @.8387096774193549 Fiscore: The scoe sable math combines reson andreatintonsinglevave, MAA ee ‘The FL score ranges from O to J, where. indicates perfect precision and recall, and 0 Ee er ere ten Indicates poor performance.Question 2 Dataset Exploration ‘+ The Given Dataset “twmarketing.sv" has two columns with feature tv" and “sales” ‘+ We observe that there are no null values and the datatype of each ‘column + All the parameters for each column are as follows Outliers + Here we have plotted Distribution plot the see whether the features are normally distributed or they are skewed in nature. ‘+ Wecan clearly see the the sales feature is normally distributed however the Tv Feature is skewed in nature which means the data hhas to be normalised before it can be brought into use Feature-Target Dependence + We have used scatter plot representation to ‘observe the elation between the target feature Sales and TV ‘+ We then compute mean and standard deviation of the dataset for each featureNormalization of the TV marketing budget and sales columns + Each feature is normalized using the given formuta: X= Xenia ‘We can clearly see that now the data has been confined. between range of Es Train-Test Split By Nee Scikit learn library is used to spit the data into test and trains segments with ratio 80:20,Task-3: Linear Regression Implementation Hypothesis function Predicts eutput (y_pred) using the linear equation with given weight (wl) and bias (0) Cost function Computes the mean squared error (MSE) between predicted values and actual output, providing a measure of how well the models performing Gradient descent function * Initializes random weights (wa and wO) and performs iterative updates tominimize the cost + Ineach iteration: a.Calculates the gradionts (wi grad and w0_grad) by summing the product ofthe prediction errors and input features. b.Updates weights using the learning rate (alpha) and the average of stadients .Appends the current cost t0 the C0St list jyseqeus, le) se Input Dat: Parameters: 8 + Xtraln_array: Normalized TV feature values. cost Futon: 10.0) + y.train_array: Normalized Sales values. Hyperparameters: + alpha: Learning rate, determining the step size for each update + Iterations: Number of iterations to update weights ‘+ Mean Squared Error and Mean Absolute terror are then calculated on the test set as ‘per the formula shown inthe image and ‘Outputs shown as belowQuestion 3 Dataset Exploration «+ The Given Dataset "boston.cav” has in total 14 columns + We observe that there are no nul values and the datatype of each column is number Outliers ‘+ We have Plotted Correlation Matric Heat Map to study the features + RAD and TAX are closely correlated to each other as visbe in the heat map and CHAS couluma’s correlation with MEDV is very les hence itcan be removed in pre-processing step Feature-Target Dependence + Wehave used scatter plot representation to ‘observe the elation between the target feature Medv and other 13 Features ‘+ We then compute mean and standard deviation of the dataset for each featureNormalization of ‘+ Here we plot Distribution plot to whether the features follow normal distribution or are Skewed in nature ‘+ We can see that majority ofthe columns are skewed in nature hich means that they need to be normalised before we can actually use them «+ Each feature is normalized using the siven formula + The normalized Feature look tke as shown ‘We can clearly see that now the data has been confined between range of 1 columnsTask-3: Linear Regression Implementation Difference Between the simple Linear Regression and Multivariable Regression Implementation Hypothesis function Taning Loss Over Hteratons yy predicted earlier was Y= A=Bx but now it has become Y=w1-XL+W2X2+..+wn-Xn+wW0 + Updated hypothesis Function to handle multiple Features, “np.dot(x, parameters)” mean_squared_error: gus will ealcalate error forallthe features simlataneouly'ap mean. pred-y.actuad*2) =... Gradient descent function | ood * The Gradient descent function will multiple coeficients (weights) corcesponding to each = aors features. Gradients parameters all will be calculated for multiple features i mean_absolute_error fe + wil calculate error forall the features simulataneouly “np.meaninp.absly_pred-y-actual)” °°: Error V/s Iteration Plot eta me ‘have plotted Mean Squared Error Vs iteration plot to See how the model improves in succesive iterations Finally the Mean Squared Error and Absolute Error on Test has been Calculated

Linear Regression Assignment
0% (2)
Linear Regression Assignment
8 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Credit Risk Project
No ratings yet
Credit Risk Project
11 pages
dsbda_5
No ratings yet
dsbda_5
4 pages
Project Report
100% (3)
Project Report
36 pages
Credit_Card_Approval_Prediction_Report-Final
No ratings yet
Credit_Card_Approval_Prediction_Report-Final
27 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Train
No ratings yet
Train
17 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
22K61A0654_2_sasi_auto
No ratings yet
22K61A0654_2_sasi_auto
24 pages
TE_ML_LAB_mannual
No ratings yet
TE_ML_LAB_mannual
21 pages
1
No ratings yet
1
19 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
ML Lab Record_250625_105014
No ratings yet
ML Lab Record_250625_105014
29 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
2_DataPreProcessing_code
No ratings yet
2_DataPreProcessing_code
46 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
ML Assignment
No ratings yet
ML Assignment
34 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Problem 1 Report Trần Minh Long 2052154 Final
No ratings yet
Problem 1 Report Trần Minh Long 2052154 Final
31 pages
paper1 lite
No ratings yet
paper1 lite
18 pages
ml-merged
No ratings yet
ml-merged
51 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
Final Report (1)
No ratings yet
Final Report (1)
17 pages
Ml Cyber Lab
No ratings yet
Ml Cyber Lab
16 pages
Assignment 2: Hive
No ratings yet
Assignment 2: Hive
11 pages
FeatureEngineering (1)
No ratings yet
FeatureEngineering (1)
50 pages
Week 10
No ratings yet
Week 10
50 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Task
No ratings yet
Task
3 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
DDA3020_2024F_HW1
No ratings yet
DDA3020_2024F_HW1
6 pages
Linear Regression Example
No ratings yet
Linear Regression Example
28 pages
B22EE010 Report
No ratings yet
B22EE010 Report
9 pages
CP4252 MACHINE LEARNING LABORATORY
No ratings yet
CP4252 MACHINE LEARNING LABORATORY
37 pages
ML New record (5)
No ratings yet
ML New record (5)
51 pages
PRNN_2023_Assignment1
No ratings yet
PRNN_2023_Assignment1
2 pages
Aquif Ibrar 1212
No ratings yet
Aquif Ibrar 1212
9 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Weekly Homework X
No ratings yet
Weekly Homework X
15 pages

Question 1 The Given Dataset Can Be Visualized As Follows

Uploaded by

Question 1 The Given Dataset Can Be Visualized As Follows

Uploaded by

You might also like