WW-M1 Bernardo

The document discusses improving prediction model performance in machine learning through careful feature selection. It describes building a multiple linear regression model to predict car selling prices using variables like year. Initial metrics on the training data show the model fits reasonably well but leaves room for improvement, with a mean squared error of 125. Testing on new data finds stronger performance, suggesting the model generalizes well.

Uploaded by

Paul Justine Tindugan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views3 pages

WW-M1 Bernardo

Uploaded by

Paul Justine Tindugan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Predictive Modeling Strategies

Danielito U. Bernardo
Bachelor of Science in Information Technology
Jose Rizal University
Mandaluyong City, Philippines
[email protected]

Abstract—This abstract outlines the significance of augmentation of prediction model performance lends itself to
attribute selection in the formulation of effective predictive improved interpret-ability. By pinpointing the most
models, with a particular emphasis on the application within informative attributes, a deeper comprehension of the
multiple linear regression frameworks. The careful underlying patterns and relationships within the data is
identification and selection of important features from a attained. This, in turn, enables the extraction of meaningful
datasets are pivotal in enhancing the accuracy, insights and facilitates more informed decision-making based
computational efficiency, and clarity of the predictive on the model's predictions. In summary, the advancement of
models. The use of correlation analysis as a fundamental prediction model performance in machine learning is
technique for attribute selection is emphasized, providing a indispensable for achieving precision in predictions,
pathway for isolating variables that significantly contribute computational efficiency, and interpret-ability. It empowers
to the predictive capability of the model. Furthermore, this the development of resilient and insightful models capable of
document details the procedural steps for constructing both furnishing accurate predictions while enriching our
simple and multiple linear regression models, leveraging the comprehension of the data.
meticulously selected variables. The overarching goal of this
research is to devise models that not only predict with high
B. Briefly summarize the prediction model built in the
accuracy but also yield insights and interpretations that are
previous exercise and its baseline performance
directly applicable to real-world data scenarios.

Keywords - Coefficient, Correlation, Linear Regression, In the previous exercise, a multiple linear regression model
Multiple Linear Regression, datasets, Models, Attributes, was developed to forecast the selling price of cars. This model
Values, predictive modeling utilizes carefully chosen variables with strong correlations as
predictors. Evaluating the model's initial performance involves
I. INTRODUCTION employing diverse metrics. Notably, the correlation coefficient
(r) between the year and selling price stands at -0.37,
A. Provide an overview of the exercise objective and the
indicating a slight negative correlation, implying a marginal
importance of improving prediction model performance
decline in selling price over time. Another important metric is
in machine learning.
the R-squared score, measuring the extent of variability in
selling price attributable to the linear model incorporating the
The objective of this exercise is to enhance the capability of year. With a training data R-squared score of 0.005, merely
prediction models in machine learning. This is accomplished 0.5% of the selling price variability can be attributed solely to
by meticulously selecting the most relevant attributes or the year variable. Furthermore, the model features an intercept
features from the datasets. Such an approach serves to support of -729.72, suggesting that if all predictor variables were zero,
the accuracy and efficiency of the models in their predictive the predicted selling price would be -729.72. Overall, the
capabilities. The enhancement of prediction model multiple linear regression model elucidates a notable portion
performance holds immense importance within the realm of of the selling price variance and demonstrates effective
machine learning for various reasons. Firstly, it facilitates the generalization to new data. Nevertheless, it's crucial to
generation of more precise predictions, a critical necessity acknowledge that the model's performance may be influenced
across diverse sectors such as finance, healthcare, and by data cleaning, validation procedures, and the potential for
marketing. These precise predictions aid businesses in making over-fitting.
well-informed decisions, streamlining processes, and
ultimately improving outcomes. Secondly, the improvement in
prediction model performance contributes to heightened
computational efficiency. Through the judicious selection of
relevant features, the model's complexity is streamlined,
resulting in enhanced memory usage and processing time
efficiency. This aspect is particularly significant when
grappling with sizable datasets or real-time applications where
speed is of paramount importance. Furthermore, the
II. INITIAL MODEL PERFORMANCE predictions are off by approximately 11.25 units from the
actual selling prices.
A. Describe the initial performance of the prediction model,
including evaluation metrics such as accuracy, precision,
The significance of these metrics can’t be overstated. A lower
recall, or mean squared error (MSE).
MSE value is indicative of a model that can closely mirror the
training data, suggesting a better fit. In layman's terms, these
initial metrics are encouraging because they show our model
has a strong grasp on the datasets, promising reliable
predictions of selling prices. However, it's crucial to remember
that these numbers are just the starting point. They set the
baseline for how well the model can be expected to perform,
and ideally, we'd like to see these errors reduce further as we
refine the model.

B. Present any insights gained from analyzing the model's

performance on the test datasets.

When we assess the model's performance, we look at key

metrics like the correlation coefficient and the R-squared
score. These metrics help us understand how accurately the
model predicts car selling prices. For instance, the correlation
coefficient between the year and the selling price is -0.37. This
suggests a small negative correlation, indicating a slight
decrease in selling price over time. Then there's the R-squared
score, which tells us how much of the selling price variability
can be explained by the model. In the training data, the R-
squared score is only 0.005, meaning just 0.5% of the
variability can be attributed to the year alone. But here's the
interesting part when we test the model on new data, the R-
squared score jumps to 0.864. This shows that the model
performs really well on unseen data, accurately predicting
selling prices for cars not in the training set, the model also
includes an intercept of -729.72, which is like a starting point
for predictions when all other variables are zero.

Overall, while the model does a good job on new data and
captures some relationship between the year and the selling
price, it's clear that relying solely on the year isn't enough. We
need to consider other factors to make more precise
predictions about car prices.
When we first tested the prediction model using our training
datasets, the results were promising. The Mean Squared Error
(MSE), which helps us understand how far off our predictions III. CHALLENGES FACED
are on average by measuring the square of the difference
between actual and predicted values, was recorded at 126.64. A. Identify and discuss the main challenges encountered
This figure might seem abstract at first glance, but it’s crucial during model evaluation, as discussed in the lecture on
for assessing the model's accuracy. In simpler terms, the MSE the main challenges of machine learning methods.
tells us that, on average, the model's predictions deviate from
the actual selling prices in a squared sense by this amount. During the exercise, several challenges were encountered in
the context of predictive modeling. One of the main challenges
Complementing the MSE, the Root Mean Squared Error discussed was dealing with missing values in the datasets. It
(RMSE) stood at 11.25. The RMSE is particularly insightful was acknowledged that if the datasets had missing values, it
because it brings the scale of our errors back down to the could have been difficult to decide on an appropriate strategy.
original units of our target variable, making it easier to
interpret. An RMSE of 11.25 means that typically, the model's Another challenge mentioned was inconsistent data formats or
errors in measurement, which can lead to noise in the model.
V. IMPROVED MODEL PERFORMANCE
The issue of over-fitting, where the model fits the training data
A. Present the updated performance metrics of the
too closely and fails to generalize to new data, was also
prediction model after applying the improvement
discussed as a challenge.
strategies.

B. For each challenge, provide specific examples from the

datasets or model evaluation process

B. Provide a comparative analysis between the initial and

When evaluating models with datasets that contain missing
improved model performance.
values, determining a suitable strategy can pose a challenge.
For instance, if the car datasets has missing values in the
"mileage" attribute, addressing these gaps during the model C. Discuss what you think are implications of your
evaluation process becomes necessary. This typically involves observation for real-world applications
handling missing values by means such as imputation with
either mean or median values, or employing more
sophisticated techniques like regression imputation.
VI. CONCLUSIONS
In the model evaluation phase, when the linear regression
model shows exceptional performance on the training datasets A. Summarize the key findings of the laboratory exercise,
but struggles with unseen data, it suggests over-fitting. For including the challenges faced, strategies applied, and
example, if the model accurately predicts car selling prices the impact on model performance.
within the training set but struggles to generalize to new data,
employing techniques like cross-validation becomes necessary
to curb over-fitting and enhance the model's generalization
ability.
B. Reflect on the importance of addressing model
challenges and continuous improvement in machine
When the linear regression model lacks complexity and cannot
learning projects
adequately capture the inherent patterns within the car
datasets, it leads to under-fitting. Consequently, the model
exhibits high bias and low variance, resulting in subpar
performance on both the training and test datasets. To address
under-fitting, enhancements to the feature engineering process, REFERENCES
the inclusion of additional pertinent features, or the adoption [1] Salim, F., & Abu, N. A. (2021). Used car price estimation:
of more intricate models may be necessary during the model Moving from linear regression towards a new s-curve model.
evaluation stage. International Journal of Business and Society, 22(3), 1174-
1187.
https://www.researchgate.net/publication/357130079_Used_
Car_Price_Estimation_Moving_from_Linear_Regression_to
IV. STRATEGIES APPLIED wards_a_New_S-Curve_Model
A. Discuss and show how you built both the simple and
multiple linear models using the selected variables. [2] Sharma, A. D., & Sharma, V. (2020). Used Car Price
Prediction using Linear Regression Model. IRJMETS, 2,
946-953.
https://www.irjmets.com/uploadedfiles/paper/volume2/issue
_11_november_2020/4868/1628083194.pdf

B. Show model performance evaluation for both models and [3] Sumeyra, M. U. T. İ., & YILDIZ, K. (2023). Using linear
discuss your interpretation of the results. regression for used car price prediction. International
Journal of Computational and Experimental Science and
Engineering, 9(1), 11-16.
https://www.researchgate.net/publication/369079425_Using_
. Linear_Regression_For_Used_Car_Price_Prediction

2014 Data Analytics For Renewable Energy Integration PDF
No ratings yet
2014 Data Analytics For Renewable Energy Integration PDF
159 pages
Assignment Report - Predictive Modelling - Rahul Dubey
No ratings yet
Assignment Report - Predictive Modelling - Rahul Dubey
18 pages
Girish Chadha - 29th December 2022
100% (3)
Girish Chadha - 29th December 2022
35 pages
Problem: # Partition
No ratings yet
Problem: # Partition
5 pages
Model Evalution
No ratings yet
Model Evalution
6 pages
Assignment1
No ratings yet
Assignment1
11 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Sales
No ratings yet
Sales
7 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages
Car Price Prediction
No ratings yet
Car Price Prediction
12 pages
Car Price Prediction Using Machine Learning
33% (3)
Car Price Prediction Using Machine Learning
15 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Devidutta_Predictive_Modeling.pdf
No ratings yet
Devidutta_Predictive_Modeling.pdf
25 pages
hw16_109090023
No ratings yet
hw16_109090023
22 pages
Model Comparison and Calibration Assessment
No ratings yet
Model Comparison and Calibration Assessment
70 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages
Final Report (1)
No ratings yet
Final Report (1)
17 pages
S2-Linear-Regression-LKW-9March2025
No ratings yet
S2-Linear-Regression-LKW-9March2025
23 pages
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
No ratings yet
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
7 pages
Machine Learning Project Car Price Prediction Algorithm
No ratings yet
Machine Learning Project Car Price Prediction Algorithm
4 pages
Aiml Project
No ratings yet
Aiml Project
13 pages
DSPY Lab Project (Formatted) 2
No ratings yet
DSPY Lab Project (Formatted) 2
14 pages
Iductive Ias
No ratings yet
Iductive Ias
47 pages
pds
No ratings yet
pds
19 pages
06_prediction-and-decision-making.en
No ratings yet
06_prediction-and-decision-making.en
2 pages
Unit 5
No ratings yet
Unit 5
18 pages
Full Text 02
No ratings yet
Full Text 02
52 pages
Carprediction
No ratings yet
Carprediction
9 pages
Car Price Prediction Report
No ratings yet
Car Price Prediction Report
8 pages
BIA UNIT II
No ratings yet
BIA UNIT II
37 pages
6.Classification & Regression
No ratings yet
6.Classification & Regression
45 pages
Group 5 AI END SEM REPORT (1) (1)
No ratings yet
Group 5 AI END SEM REPORT (1) (1)
11 pages
Salary Prediction
No ratings yet
Salary Prediction
4 pages
Savitribai Phule Pune University: A Report On Mini Project
No ratings yet
Savitribai Phule Pune University: A Report On Mini Project
10 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Rohit Godke Dsbda Report Sppu
No ratings yet
Rohit Godke Dsbda Report Sppu
10 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
Pooja Kabadi- Predictive Modelling Project
No ratings yet
Pooja Kabadi- Predictive Modelling Project
70 pages
Module I Complete Notes
No ratings yet
Module I Complete Notes
136 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
turover prediction
No ratings yet
turover prediction
52 pages
Numeric
No ratings yet
Numeric
20 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
mini
No ratings yet
mini
16 pages
Project Immo en
No ratings yet
Project Immo en
11 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Final Cc01 Group7
No ratings yet
Final Cc01 Group7
23 pages
INSY446 - 02 - Linear Model Part 1
No ratings yet
INSY446 - 02 - Linear Model Part 1
27 pages
regression_presentation
No ratings yet
regression_presentation
12 pages
Project Predictive Modeling
No ratings yet
Project Predictive Modeling
43 pages
Building a Stock Market Prediction Model Using Machine Learning (1) (2) (1)
No ratings yet
Building a Stock Market Prediction Model Using Machine Learning (1) (2) (1)
11 pages
Final Report
No ratings yet
Final Report
28 pages
Stock Market Prediction Using Ensemble Learning
No ratings yet
Stock Market Prediction Using Ensemble Learning
48 pages
ICSCSP 2021 Proceedings-477-488
No ratings yet
ICSCSP 2021 Proceedings-477-488
12 pages
SVM Code Stock Prediction
No ratings yet
SVM Code Stock Prediction
5 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Ids Case Study
No ratings yet
Ids Case Study
15 pages
Laptop Price Pred
No ratings yet
Laptop Price Pred
11 pages
Financial Analytics Study To Build Robust Trading Strategies Capstone Project - 2022-23
No ratings yet
Financial Analytics Study To Build Robust Trading Strategies Capstone Project - 2022-23
17 pages
Technology in Education Technology Presentation in Blue Peach Illustrative Style
No ratings yet
Technology in Education Technology Presentation in Blue Peach Illustrative Style
11 pages
Methodology Summary: 2020 SOCIAL Progress Index
No ratings yet
Methodology Summary: 2020 SOCIAL Progress Index
36 pages
A Systematic Overview of Single-Cell Transcriptomics Databases, Their Use Cases, and Limitations
No ratings yet
A Systematic Overview of Single-Cell Transcriptomics Databases, Their Use Cases, and Limitations
17 pages
VIGAN: Missing View Imputation With Generative Adversarial Networks
No ratings yet
VIGAN: Missing View Imputation With Generative Adversarial Networks
10 pages
The State of Evidence For Social and Emotional Learning: A Contemporary Meta-Analysis of Universal School-Based SEL Interventions
100% (1)
The State of Evidence For Social and Emotional Learning: A Contemporary Meta-Analysis of Universal School-Based SEL Interventions
113 pages
Advanced Data Cleaning Techniques With PySpark
No ratings yet
Advanced Data Cleaning Techniques With PySpark
25 pages
Sarndal Estimation Non Response
No ratings yet
Sarndal Estimation Non Response
172 pages
Data Mining Methods: Data Pre-Processing: Prof. Dr. Christina Andersson
No ratings yet
Data Mining Methods: Data Pre-Processing: Prof. Dr. Christina Andersson
33 pages
EQS 6 User Guide R8
No ratings yet
EQS 6 User Guide R8
348 pages
Essays On Data Analysis
100% (1)
Essays On Data Analysis
136 pages
Multivariate Missing Data in Hydrology - Review and Applications
No ratings yet
Multivariate Missing Data in Hydrology - Review and Applications
11 pages
A comprehensive textbook on sample surveys indian statistical institute series
No ratings yet
A comprehensive textbook on sample surveys indian statistical institute series
273 pages
Joint Engagement Is A Potential Mechanism Leading To Increased Initiations of Joint Attention and Downstream Effects On Language: JASPER Early Intervention For Children With ASD
No ratings yet
Joint Engagement Is A Potential Mechanism Leading To Increased Initiations of Joint Attention and Downstream Effects On Language: JASPER Early Intervention For Children With ASD
8 pages
Rudra Bhatt Data
No ratings yet
Rudra Bhatt Data
9 pages
SL-VI Assignment
No ratings yet
SL-VI Assignment
4 pages
KNN Imputation
No ratings yet
KNN Imputation
16 pages
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
No ratings yet
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
69 pages
66 Data Analyst Interview Questions to Ace Your Interview
No ratings yet
66 Data Analyst Interview Questions to Ace Your Interview
47 pages
DW3 Part1 Partial Correction-3
No ratings yet
DW3 Part1 Partial Correction-3
5 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Data science-Unit-2
No ratings yet
Data science-Unit-2
33 pages
Model Terbenar
No ratings yet
Model Terbenar
16 pages
J.L. Schafer - Analysis of Incomplete Multivariate Data-Chapman and Hall - CRC (1997)
No ratings yet
J.L. Schafer - Analysis of Incomplete Multivariate Data-Chapman and Hall - CRC (1997)
514 pages
Course PDF
No ratings yet
Course PDF
403 pages
Introduction to Data Science in Finance
100% (1)
Introduction to Data Science in Finance
81 pages
2016 Census Report
No ratings yet
2016 Census Report
279 pages
MBA II Sem - MGMT438 - Predictive Analysis Using R - JayantKishorPurohit - JayantKishor Purohit
No ratings yet
MBA II Sem - MGMT438 - Predictive Analysis Using R - JayantKishorPurohit - JayantKishor Purohit
5 pages
Presentation - Final Thesis
No ratings yet
Presentation - Final Thesis
62 pages
Associations Between Early Family Meal Environment Quality An Later Well-Being in School-Age Children - Harbec and Pagani (2017)
No ratings yet
Associations Between Early Family Meal Environment Quality An Later Well-Being in School-Age Children - Harbec and Pagani (2017)
8 pages
Data Mining Using Sas Enterprise Miner: Mahesh Bommireddy. Chaithanya Kadiyala
No ratings yet
Data Mining Using Sas Enterprise Miner: Mahesh Bommireddy. Chaithanya Kadiyala
40 pages

WW-M1 Bernardo

Uploaded by

WW-M1 Bernardo

Uploaded by

Predictive Modeling Strategies

B. Present any insights gained from analyzing the model's

When we assess the model's performance, we look at key

B. For each challenge, provide specific examples from the

B. Provide a comparative analysis between the initial and

You might also like