WW-M1 Bernardo
WW-M1 Bernardo
Danielito U. Bernardo
Bachelor of Science in Information Technology
Jose Rizal University
Mandaluyong City, Philippines
[email protected]
Abstract—This abstract outlines the significance of augmentation of prediction model performance lends itself to
attribute selection in the formulation of effective predictive improved interpret-ability. By pinpointing the most
models, with a particular emphasis on the application within informative attributes, a deeper comprehension of the
multiple linear regression frameworks. The careful underlying patterns and relationships within the data is
identification and selection of important features from a attained. This, in turn, enables the extraction of meaningful
datasets are pivotal in enhancing the accuracy, insights and facilitates more informed decision-making based
computational efficiency, and clarity of the predictive on the model's predictions. In summary, the advancement of
models. The use of correlation analysis as a fundamental prediction model performance in machine learning is
technique for attribute selection is emphasized, providing a indispensable for achieving precision in predictions,
pathway for isolating variables that significantly contribute computational efficiency, and interpret-ability. It empowers
to the predictive capability of the model. Furthermore, this the development of resilient and insightful models capable of
document details the procedural steps for constructing both furnishing accurate predictions while enriching our
simple and multiple linear regression models, leveraging the comprehension of the data.
meticulously selected variables. The overarching goal of this
research is to devise models that not only predict with high
B. Briefly summarize the prediction model built in the
accuracy but also yield insights and interpretations that are
previous exercise and its baseline performance
directly applicable to real-world data scenarios.
Keywords - Coefficient, Correlation, Linear Regression, In the previous exercise, a multiple linear regression model
Multiple Linear Regression, datasets, Models, Attributes, was developed to forecast the selling price of cars. This model
Values, predictive modeling utilizes carefully chosen variables with strong correlations as
predictors. Evaluating the model's initial performance involves
I. INTRODUCTION employing diverse metrics. Notably, the correlation coefficient
(r) between the year and selling price stands at -0.37,
A. Provide an overview of the exercise objective and the
indicating a slight negative correlation, implying a marginal
importance of improving prediction model performance
decline in selling price over time. Another important metric is
in machine learning.
the R-squared score, measuring the extent of variability in
selling price attributable to the linear model incorporating the
The objective of this exercise is to enhance the capability of year. With a training data R-squared score of 0.005, merely
prediction models in machine learning. This is accomplished 0.5% of the selling price variability can be attributed solely to
by meticulously selecting the most relevant attributes or the year variable. Furthermore, the model features an intercept
features from the datasets. Such an approach serves to support of -729.72, suggesting that if all predictor variables were zero,
the accuracy and efficiency of the models in their predictive the predicted selling price would be -729.72. Overall, the
capabilities. The enhancement of prediction model multiple linear regression model elucidates a notable portion
performance holds immense importance within the realm of of the selling price variance and demonstrates effective
machine learning for various reasons. Firstly, it facilitates the generalization to new data. Nevertheless, it's crucial to
generation of more precise predictions, a critical necessity acknowledge that the model's performance may be influenced
across diverse sectors such as finance, healthcare, and by data cleaning, validation procedures, and the potential for
marketing. These precise predictions aid businesses in making over-fitting.
well-informed decisions, streamlining processes, and
ultimately improving outcomes. Secondly, the improvement in
prediction model performance contributes to heightened
computational efficiency. Through the judicious selection of
relevant features, the model's complexity is streamlined,
resulting in enhanced memory usage and processing time
efficiency. This aspect is particularly significant when
grappling with sizable datasets or real-time applications where
speed is of paramount importance. Furthermore, the
II. INITIAL MODEL PERFORMANCE predictions are off by approximately 11.25 units from the
actual selling prices.
A. Describe the initial performance of the prediction model,
including evaluation metrics such as accuracy, precision,
The significance of these metrics can’t be overstated. A lower
recall, or mean squared error (MSE).
MSE value is indicative of a model that can closely mirror the
training data, suggesting a better fit. In layman's terms, these
initial metrics are encouraging because they show our model
has a strong grasp on the datasets, promising reliable
predictions of selling prices. However, it's crucial to remember
that these numbers are just the starting point. They set the
baseline for how well the model can be expected to perform,
and ideally, we'd like to see these errors reduce further as we
refine the model.
Overall, while the model does a good job on new data and
captures some relationship between the year and the selling
price, it's clear that relying solely on the year isn't enough. We
need to consider other factors to make more precise
predictions about car prices.
When we first tested the prediction model using our training
datasets, the results were promising. The Mean Squared Error
(MSE), which helps us understand how far off our predictions III. CHALLENGES FACED
are on average by measuring the square of the difference
between actual and predicted values, was recorded at 126.64. A. Identify and discuss the main challenges encountered
This figure might seem abstract at first glance, but it’s crucial during model evaluation, as discussed in the lecture on
for assessing the model's accuracy. In simpler terms, the MSE the main challenges of machine learning methods.
tells us that, on average, the model's predictions deviate from
the actual selling prices in a squared sense by this amount. During the exercise, several challenges were encountered in
the context of predictive modeling. One of the main challenges
Complementing the MSE, the Root Mean Squared Error discussed was dealing with missing values in the datasets. It
(RMSE) stood at 11.25. The RMSE is particularly insightful was acknowledged that if the datasets had missing values, it
because it brings the scale of our errors back down to the could have been difficult to decide on an appropriate strategy.
original units of our target variable, making it easier to
interpret. An RMSE of 11.25 means that typically, the model's Another challenge mentioned was inconsistent data formats or
errors in measurement, which can lead to noise in the model.
V. IMPROVED MODEL PERFORMANCE
The issue of over-fitting, where the model fits the training data
A. Present the updated performance metrics of the
too closely and fails to generalize to new data, was also
prediction model after applying the improvement
discussed as a challenge.
strategies.
B. Show model performance evaluation for both models and [3] Sumeyra, M. U. T. İ., & YILDIZ, K. (2023). Using linear
discuss your interpretation of the results. regression for used car price prediction. International
Journal of Computational and Experimental Science and
Engineering, 9(1), 11-16.
https://www.researchgate.net/publication/369079425_Using_
. Linear_Regression_For_Used_Car_Price_Prediction