ML Assignment 2

Uploaded by

ameerrabie2003

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

ML Assignment 2

Uploaded by

ameerrabie2003

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Electrical and Computer Engineering Department

Machine Learning and Data Science ‐ ENCS5341

Assignment #2
Submission deadline: 23/11/2024

This assignment may be completed by a group of up to two students.

Machine Learning Project: Regression Analysis and Model Selection

Project Overview

The objective of this project is to build a series of regression models using a dataset, evaluate and
compare their performance, and apply various techniques to improve model accuracy and prevent
overfitting. The focus will be on both linear and nonlinear regression models. Students will also use
feature selection methods and regularization techniques, followed by hyperparameter tuning, to
select the optimal model. This project can be done in groups of two students at most.

Steps and Requirements

1. Dataset
Cars Dataset: Dataset from online selling website yallamotors.
https://www.kaggle.com/datasets/ahmedwaelnasef/cars-dataset/data

This dataset, scraped from the YallaMotors website using Python and the Requests-HTML
library, includes around 6,750 rows and 9 columns. It’s well-suited for Exploratory Data
Analysis (EDA) and machine learning tasks, particularly for predictive modeling using
algorithms like Linear Regression. The main objective of this dataset is to predict car prices,
making it ideal for developing regression models to understand the relationship between
various features (e.g., car make, model, year, mileage, engine size, etc.) and the target
variable (car price).

Through EDA, you can explore patterns, outliers, and relationships in the data, which will
help refine your model. For the machine learning task, Linear Regression could be a good
starting point, but more complex regression models can also be applied if necessary to
capture non-linear relationships and improve predictive accuracy.

Note that car prices are listed in various currencies. To ensure consistency, you may need to
standardize all prices to a common currency, such as USD, for a uniform target variable.
This will help avoid discrepancies and improve the accuracy of any predictive modeling.

Data Preprocessing Steps:

o Clean the dataset by handling missing values, encoding categorical features, and
normalizing or standardizing numerical features if necessary.
o Split the dataset into training, validation, and test sets. A common split would be
60% for training, 20% for validation, and 20% for testing.
2. Building Regression Models Implement a set of linear and nonlinear regression models:
o Linear Models: Linear Regression, LASSO (L1 Regularization), Ridge Regression
(L2 Regularization). See the details in point 5 below.
o Use the closed-form solution: Apply the closed-form solution for the linear
regression model to solve the system of linear equations and obtain the model
parameters. Then, compare this model with the one derived using the gradient
descent method. Implement this part without using any external APIs or libraries for
linear regression.
o Nonlinear Models: Polynomial Regression (vary the polynomial degree from 2 to
10), and standard Gaussian kernel, known here as a Radial Basis Function (RBF).
3. Model Selection Using Validation Set Evaluate each model on the validation set and use
metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared to
compare their performance. The best model can be selected based on the lowest MSE or
highest R-squared score on the validation set.
4. Feature Selection with Forward Selection Use a forward selection method to iteratively
add features to the model, selecting features that improve model performance. The forward
selection process will:
o Start with an empty model and gradually add features one at a time.
o At each step, add the feature that, when included, minimizes the error on the
validation set.
o Stop once additional features no longer improve the model performance or a
maximum number of features is reached.
5. Applying Regularization Techniques Use LASSO and Ridge regularization to control
overfitting. These techniques will help to reduce the model complexity by penalizing large
coefficients and potentially zeroing out less relevant features (LASSO).

Steps:

o Implement LASSO and Ridge regression with different values of the regularization
parameter λ.
o Use Grid Search to find the optimal λ value that minimizes the error on the
validation set.
6. Hyperparameter Tuning with Grid Search Apply grid search to find the best
hyperparameters for each model (e.g., λ for regularized models). This step ensures that each
model is tuned for optimal performance.
7. Model Evaluation on Test Set After selecting the best model based on the validation set,
evaluate the chosen model on the test set to obtain a final performance metric. Report on
how well the model generalizes to unseen data.
8. Optional: Try identifying another relevant target variable in the dataset and build a
regression model to predict its values.
9. Reporting the Results Prepare a detailed report on the findings, including:
o Description of the dataset, preprocessing steps, and features used.
o Details of each regression model and its performance on the validation set.
o Explanation of feature selection results using forward selection.
o Regularization results with the optimal λ values for LASSO and Ridge.
o Model selection process with grid search and hyperparameter tuning.
o Final evaluation on the test set and a discussion of the selected model's performance
and limitations.
o Visualizations to support findings (e.g., feature importances, error distribution, and
model predictions vs. actual values).
Submission Requirements

• Compress the project files (code and report) into a single ZIP file.
• Submit both code and report files separately, following the provided file naming format.
• Adhere to the submission deadline and note the late submission policy.

Group Work Policy

This project can be completed in groups of two students at most. Each student should clearly state
their contribution to the project in the report.

Submission:
A- A comprehensive report that describes the dataset and summarizes and discusses
all the results and findings as required above.
B- Your code (python code) in either .py format or a Jupyter Notebook with both
the code and visualizations.
C- Please compress your files, including both the code and the report, into a single
zip file and submit it to the ritaj before the deadline. The file name should follow
this format: "LastName_ID_Student1_LastName_ID_Student2.ZIP".
D- Late submissions will be accepted up to 3 days after the deadline, with a 10%
deduction for each day delayed.

Hint:
You can use the following python libraries: Pandas,
NumPy, Matplotlib, Seaborn, sklearn

Manual ETD Scintrex E3500 (Old Version)
No ratings yet
Manual ETD Scintrex E3500 (Old Version)
62 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Data Science and Machine Learning Essentials: Lab 4A - Working With Regression Models
No ratings yet
Data Science and Machine Learning Essentials: Lab 4A - Working With Regression Models
24 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
ML Hota Assign3
No ratings yet
ML Hota Assign3
4 pages
Predicting Stock Values Using A Recurrent Neural Network
No ratings yet
Predicting Stock Values Using A Recurrent Neural Network
12 pages
Project Description Document
No ratings yet
Project Description Document
7 pages
Unit 1 AAM
No ratings yet
Unit 1 AAM
16 pages
Assignment_2
No ratings yet
Assignment_2
3 pages
Sample Report
No ratings yet
Sample Report
3 pages
Week 10 - PROG 8510 Week 10
No ratings yet
Week 10 - PROG 8510 Week 10
16 pages
Rohit Unit 2 ML Notes
No ratings yet
Rohit Unit 2 ML Notes
7 pages
Lab 8
No ratings yet
Lab 8
5 pages
Capstone Project 2
No ratings yet
Capstone Project 2
27 pages
CQF EXAM 3-Answer
No ratings yet
CQF EXAM 3-Answer
14 pages
Review 2
No ratings yet
Review 2
6 pages
Key Terms in Machine Learning
No ratings yet
Key Terms in Machine Learning
6 pages
Assignment 9[1]
No ratings yet
Assignment 9[1]
8 pages
Traffic Flow Prediction Using The METR-LA Traffic
No ratings yet
Traffic Flow Prediction Using The METR-LA Traffic
8 pages
FML Micro Project
No ratings yet
FML Micro Project
12 pages
Unit 5
No ratings yet
Unit 5
11 pages
Nikhil_Sanjay_Thorat_Assignment_2
No ratings yet
Nikhil_Sanjay_Thorat_Assignment_2
9 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
4 pages
AIML%20Short%20Term%20Internship%20Session%2011%20Summary-1719393828217
No ratings yet
AIML%20Short%20Term%20Internship%20Session%2011%20Summary-1719393828217
3 pages
Data Mining Assignment Help
No ratings yet
Data Mining Assignment Help
5 pages
Report
No ratings yet
Report
40 pages
chapter3
No ratings yet
chapter3
9 pages
Practical 8
No ratings yet
Practical 8
2 pages
S-8
No ratings yet
S-8
4 pages
ML 21-22 Sem
No ratings yet
ML 21-22 Sem
10 pages
Pravesh 6301
No ratings yet
Pravesh 6301
11 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Data Science Module 4 q & A
No ratings yet
Data Science Module 4 q & A
9 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
Interview questions companie
No ratings yet
Interview questions companie
72 pages
_ML Report__22112037
No ratings yet
_ML Report__22112037
9 pages
DSBDL - Write - Ups - 4 To 7
No ratings yet
DSBDL - Write - Ups - 4 To 7
11 pages
Bangalore House Price Prediction Using The Best Machine Learning Model Submitted by Rukzana Vadakkekudy Rassak P2682221
No ratings yet
Bangalore House Price Prediction Using The Best Machine Learning Model Submitted by Rukzana Vadakkekudy Rassak P2682221
9 pages
MACHINE LEARNING PROJECT
No ratings yet
MACHINE LEARNING PROJECT
29 pages
Report
No ratings yet
Report
9 pages
ML Assignment (22BCE8086) 2
No ratings yet
ML Assignment (22BCE8086) 2
19 pages
17 - PPT - NLP Project-2-24
No ratings yet
17 - PPT - NLP Project-2-24
23 pages
Title Predicting House Pricing Using AIML (KASHISH)
No ratings yet
Title Predicting House Pricing Using AIML (KASHISH)
2 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
IDS UNIT1
No ratings yet
IDS UNIT1
3 pages
Evaluatingperformanceofregressionmachinelearningmodels SSRN Id3177507
No ratings yet
Evaluatingperformanceofregressionmachinelearningmodels SSRN Id3177507
17 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
ISB_Assignment 2
No ratings yet
ISB_Assignment 2
5 pages
AIM Assignment 2 Report
No ratings yet
AIM Assignment 2 Report
2 pages
Anomaly Detection Using The Numenta Anomaly Benchmark
No ratings yet
Anomaly Detection Using The Numenta Anomaly Benchmark
8 pages
Car_Dekho-Used_Car_Price_Prediction
No ratings yet
Car_Dekho-Used_Car_Price_Prediction
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
Assignment_1_Machine Learning
No ratings yet
Assignment_1_Machine Learning
3 pages
Group_1_Practical
No ratings yet
Group_1_Practical
16 pages
BA TERM 2 PROJECT
No ratings yet
BA TERM 2 PROJECT
6 pages
Project report
No ratings yet
Project report
3 pages
Detailed-Report-Regression-Models-for-House-Price-Prediction
No ratings yet
Detailed-Report-Regression-Models-for-House-Price-Prediction
3 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
No ratings yet
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
5 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Biology Lab
No ratings yet
Biology Lab
2 pages
1 PDF
No ratings yet
1 PDF
1 page
EPAS_NW_ARCHI_EN_AN_1.18
No ratings yet
EPAS_NW_ARCHI_EN_AN_1.18
129 pages
MIT TechnologyReview-March - April 2023
100% (1)
MIT TechnologyReview-March - April 2023
92 pages
LS 6 LR Module 3_Evolution_of_Media_Myda_Tubice (Final)
No ratings yet
LS 6 LR Module 3_Evolution_of_Media_Myda_Tubice (Final)
41 pages
Usman Pervaiz
No ratings yet
Usman Pervaiz
3 pages
Pocket Radar Owner's Manual
No ratings yet
Pocket Radar Owner's Manual
12 pages
User Guide FX Tools (FX CommPro N2)
No ratings yet
User Guide FX Tools (FX CommPro N2)
44 pages
III Year EC
No ratings yet
III Year EC
23 pages
Optical Communication Systems: Introduce An Overview of Optical Systems and Actual Works
No ratings yet
Optical Communication Systems: Introduce An Overview of Optical Systems and Actual Works
39 pages
Audio Amplifiers
No ratings yet
Audio Amplifiers
63 pages
Business Processes, Information, and Information Systems
No ratings yet
Business Processes, Information, and Information Systems
21 pages
Proposal Compressed
No ratings yet
Proposal Compressed
4 pages
Computer Specs
No ratings yet
Computer Specs
29 pages
E Invoice Relyon Softech LTD
No ratings yet
E Invoice Relyon Softech LTD
13 pages
Lexium 32 - BMH e BSH
No ratings yet
Lexium 32 - BMH e BSH
51 pages
Final Exam
100% (1)
Final Exam
14 pages
Experiment 1 Introduction To The Spectrum Analyzer Name: Galang, Vincent N. Course: BS ECE-4A Punzalan, Justine Roy A. Results
No ratings yet
Experiment 1 Introduction To The Spectrum Analyzer Name: Galang, Vincent N. Course: BS ECE-4A Punzalan, Justine Roy A. Results
4 pages
Unit 5 - Week 2: Assignment 2
No ratings yet
Unit 5 - Week 2: Assignment 2
6 pages
LSISAS1064 Product Brief
No ratings yet
LSISAS1064 Product Brief
2 pages
DJI AIR 2S Combo Conformité
No ratings yet
DJI AIR 2S Combo Conformité
52 pages
Imx183clk-J CQJ-J Flyer
No ratings yet
Imx183clk-J CQJ-J Flyer
2 pages
Relec Refresher v.2
No ratings yet
Relec Refresher v.2
2 pages
MuleSoft - Functional Test Case - DZone Performance
No ratings yet
MuleSoft - Functional Test Case - DZone Performance
4 pages
SO UI Photocopiable1
No ratings yet
SO UI Photocopiable1
1 page
Joel_Tropp
No ratings yet
Joel_Tropp
2 pages
Flowchart Practice: Scenario 1
No ratings yet
Flowchart Practice: Scenario 1
8 pages
Oops Assignment 1
No ratings yet
Oops Assignment 1
3 pages
EDTECH 501 - Synthesis Paper
No ratings yet
EDTECH 501 - Synthesis Paper
13 pages

ML Assignment 2

Uploaded by

ML Assignment 2

Uploaded by

Electrical and Computer Engineering Department

Machine Learning and Data Science ‐ ENCS5341

This assignment may be completed by a group of up to two students.

Machine Learning Project: Regression Analysis and Model Selection

Steps and Requirements

Data Preprocessing Steps:

Group Work Policy

You might also like