0% found this document useful (0 votes)
2 views

Linear Regression Summary

The document provides an overview of linear regression in machine learning, focusing on the concepts of underfitting and overfitting, as well as bias and variance. It discusses various evaluation metrics for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and R Squared (R2), highlighting their advantages and disadvantages. Additionally, it touches on classification metrics and the importance of using multiple evaluation metrics to optimize model performance.

Uploaded by

lawrencechikopa1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Linear Regression Summary

The document provides an overview of linear regression in machine learning, focusing on the concepts of underfitting and overfitting, as well as bias and variance. It discusses various evaluation metrics for regression models, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and R Squared (R2), highlighting their advantages and disadvantages. Additionally, it touches on classification metrics and the importance of using multiple evaluation metrics to optimize model performance.

Uploaded by

lawrencechikopa1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

LINEAR REGRESSION

SUMMARY
Compiled by Ada Kamunthuuli
Regression
• Regression is a type of Machine learning which helps in
finding the relationship between independent and
dependent variable.
• In simple words, Regression can be defined as a Machine
learning problem where we have to predict continuous
values like price, Rating, Fees, etc.
Underfitting and Overfitting
• When we talk about the Machine Learning model, we actually talk
about how well it performs and its accuracy which is known as
prediction errors.
• Let us consider that we are designing a machine learning model. A
model is said to be a good machine learning model if it generalizes
any new input data from the problem domain in a proper way.
• This helps us to make predictions about future data, that the data
model has never seen. Now, suppose we want to check how well our
machine learning model learns and generalizes to the new data.
• For that, we have overfitting and underfitting, which are majorly
responsible for the poor performances of the machine learning
algorithms.
Bias and Variance
• Bias: Assumptions made by a model to make a function easier to
learn. It is actually the error rate of the training data. When the error
rate has a high value, we call it High Bias and when the error rate has
a low value, we call it low Bias.
• Variance: The difference between the error rate of training data and
testing data is called variance. If the difference is high then it’s called
high variance and when the difference of errors is low then it’s called
low variance. Usually, we want to make a low variance for generalized
our model.
Underfitting
• A statistical model or a machine learning algorithm is said to
have underfitting when it cannot capture the underlying
trend of the data, i.e., it only performs well on training data
but performs poorly on testing data. (It’s just like trying to fit
undersized pants!)
• Underfitting destroys the accuracy of our machine learning
model. Its occurrence simply means that our model or the
algorithm does not fit the data well enough.
Underfitting
• It usually happens when we have fewer data to build an
accurate model and also when we try to build a linear model
with fewer non-linear data.
• In such cases, the rules of the machine learning model are
too easy and flexible to be applied to such minimal data and
therefore the model will probably make a lot of wrong
predictions.
• Underfitting can be avoided by using more data and also
reducing the features by feature selection.
Reasons for Underfitting
1. High bias and low variance
2. The size of the training dataset used is not enough.
3. The model is too simple.
4. Training data is not cleaned and also contains noise in it.
Techniques to reduce underfitting
1. Increase model complexity
2. Increase the number of features, performing feature
engineering
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of
training to get better results.
Overfitting
• A statistical model is said to be overfitted when the model
does not make accurate predictions on testing data.
• When a model gets trained with so much data, it starts
learning from the noise and inaccurate data entries in our
data set.
• And when testing with test data results in High variance.
Then the model does not categorize the data correctly,
because of too many details and noise.
Overfitting…
• The causes of overfitting are the non-parametric and non-
linear methods because these types of machine learning
algorithms have more freedom in building the model based
on the dataset and therefore they can really build unrealistic
models.
• A solution to avoid overfitting is using a linear algorithm if
we have linear data or using the parameters like the maximal
depth if we are using decision trees.
Reasons for Overfitting are as follows
1. High variance and low bias
2. The model is too complex
3. The size of the training data
Example 1
Example 2
Techniques to reduce overfitting
1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase (have an eye over
the loss over the training period as soon as loss begins to
increase stop training).
4. Ridge Regularization and Lasso Regularization
5. Use dropout for neural networks to tackle overfitting.
Good Fit in a Statistical Model
• Ideally, the case when the model makes the predictions with
0 error, is said to have a good fit on the data.
• This situation is achievable at a spot between overfitting and
underfitting. In order to understand it, we will have to look
at the performance of our model with the passage of time,
while it is learning from the training dataset.
Good Fit in a Statistical Model
• With the passage of time, our model will keep on learning, and
thus the error for the model on the training and testing data will
keep on decreasing.
• If it will learn for too long, the model will become more prone to
overfitting due to the presence of noise and less useful details.
Hence the performance of our model will decrease.
• In order to get a good fit, we will stop at a point just before
where the error starts increasing. At this point, the model is said
to have good skills in training datasets as well as our unseen
testing dataset.
A. Evaluation Metrics for Regression Model
• Most beginners and practitioners most of the time do not
bother about the model performance. The talk is about
building a well-generalized model, Machine learning model
cannot have 100 per cent efficiency otherwise the model is
known as a biased model. which further includes the
concept of overfitting and underfitting.
• It is necessary to obtain the accuracy on training data, But it
is also important to get a genuine and approximate result on
unseen data otherwise Model is of no use.
Evaluation Metrics for Regression Model
• So to build and deploy a generalized model we require to Evaluate the
model on different metrics which helps us to better optimize the
performance, fine-tune it, and obtain a better result.
• If one metric is perfect, there is no need for multiple metrics. To
understand the benefits and disadvantages of Evaluation metrics
because different evaluation metric fits on a different set of a dataset.
• Now, I hope you get the importance of Evaluation metrics. let’s start
understanding various evaluation metrics used for regression tasks.
Dataset
• For demonstrating each evaluation metric using the sci-kit-learn
library we will use the placement dataset which is a simple linear
dataset that looks something like this.

• Let’s start Exploring various Evaluation metrics.


1) Mean Absolute Error(MAE)
• MAE is a very simple metric which calculates the absolute
difference between actual and predicted values.
• To better understand, let’s take an example you have input
data and output data and use Linear Regression, which
draws a best-fit line.
• Now you have to find the MAE of your model which is
basically a mistake made by the model known as an error.
Now find the difference between the actual value and
predicted value that is an absolute error but we have to find
the mean absolute of the complete dataset.
MAE
• so, sum all the errors and divide them by a total number of
observations And this is MAE. And we aim to get a minimum
MAE because this is a loss.
Advantages and Disadvantages of MAE
Advantages
• The MAE you get is in the same unit as the output variable.
• It is most Robust to outliers.
Disadvantages
• The graph of MAE is not differentiable so we have to apply
various optimizers like Gradient descent which can be
differentiable.

➢Now to overcome the disadvantage of MAE next metric came as


MSE.
2) Mean Squared Error(MSE)
• MSE is a most used and very simple metric with a little bit of change
in mean absolute error. Mean squared error states that finding the
squared difference between actual and predicted value.
• So, above we are finding the absolute difference and here we are
finding the squared difference.
• What actually the MSE represents? It represents the squared distance
between actual and predicted values. we perform squared to avoid
the cancellation of negative terms and it is the benefit of MSE.
MSE
Advantages and Disadvantages
Advantage
The graph of MSE is differentiable, so you can easily use it as a loss
function.
Disadvantages
• The value you get after calculating MSE is a squared unit of output.
for example, the output variable is in meter(m) then after calculating
MSE the output we get is in meter squared.
• If you have outliers in the dataset then it penalizes the outliers most
and the calculated MSE is bigger. So, in short, It is not Robust to
outliers which were an advantage in MAE.
3) Root Mean Squared Error(RMSE)
• As RMSE is clear by the name itself, that it is a simple square root of
mean squared error.
Advantages and Disadvantages of RMSE
Advantage of RMSE
• The output value you get is in the same unit as the required output
variable which makes interpretation of loss easy.
Disadvantage of RMSE
• It is not that robust to outliers as compared to MAE.
➢for performing RMSE we have to NumPy NumPy square root function
over MSE.
➢Most of the time people use RMSE as an evaluation metric and
mostly when you are working with deep learning techniques the most
preferred metric is RMSE.
Other metrics but not in our syllabus
4) Root Mean Squared Log Error(RMSLE)
• Taking the log of the RMSE metric slows down the scale of error. The
metric is very helpful when you are developing a model without
calling the inputs. In that case, the output will vary on a large scale.
• To control this situation of RMSE we take the log of calculated RMSE
error and resultant we get as RMSLE.
• To perform RMSLE we have to use the NumPy log function over
RMSE.
• It is a very simple metric that is used by most of the datasets hosted
for Machine Learning competitions.
5) R Squared (R2)
• R2 score is a metric that tells the performance of your model, not the
loss in an absolute sense that how many wells did your model
perform.
• In contrast, MAE and MSE depend on the context as we have seen
whereas the R2 score is independent of context.
• So, with help of R squared we have a baseline model to compare a
model which none of the other metrics provides. The same we have
in classification problems which we call a threshold which is fixed at
0.5. So basically R2 squared calculates how must regression line is
better than a mean line.
R2
• Hence, R2 squared is also known as Coefficient of Determination or
sometimes also known as Goodness of fit.
R2
• Now, how will you interpret the R2 score? suppose If the R2 score is zero
then the above regression line by mean line is equal means 1 so 1-1 is zero.
So, in this case, both lines are overlapping means model performance is
worst, It is not capable to take advantage of the output column.
• Now the second case is when the R2 score is 1, it means when the division
term is zero and it will happen when the regression line does not make any
mistake, it is perfect. In the real world, it is not possible.
• So we can conclude that as our regression line moves towards perfection,
R2 score move towards one. And the model performance improves.
• The normal case is when the R2 score is between zero and one like 0.8
which means your model is capable to explain 80 per cent of the variance
of data
6) Adjusted R Squared
• The disadvantage of the R2 score is while adding new features in data
the R2 score starts increasing or remains constant but it never
decreases because It assumes that while adding more data variance
of data increases.
• But the problem is when we add an irrelevant feature in the dataset
then at that time R2 sometimes starts increasing which is incorrect.
• Hence, To control this situation Adjusted R Squared came into
existence.
Adjusted R Squared
• The disadvantage of the R2 score is while adding new features in data
the R2 score starts increasing or remains constant but it never
decreases because It assumes that while adding more data variance
of data increases.
• But the problem is when we add an irrelevant feature in the dataset
then at that time R2 sometimes starts increasing which is incorrect.
• Hence, To control this situation Adjusted R Squared came into
existence.
Adjusted R Squared (Cont’)
Adjusted R Squared (Cont’)
• Now as K increases by adding some features so the denominator will
decrease, n-1 will remain constant. R2 score will remain constant or
will increase slightly so the complete answer will increase and when
we subtract this from one then the resultant score will decrease. so
this is the case when we add an irrelevant feature in the dataset.
• And if we add a relevant feature then the R2 score will increase and 1-
R2 will decrease heavily and the denominator will also decrease so
the complete term decreases, and on subtracting from one the score
increases.
• Hence, this metric becomes one of the most important metrics to use
during the evaluation of the model.
B. Evaluation Metrics for Classification Problem
• Evaluation metrics are tied to machine learning tasks. There are different
metrics for the tasks of classification and regression. Some metrics, like
precision-recall, are useful for multiple tasks.
• Classification and regression are examples of supervised learning, which
constitutes a majority of machine learning applications. Using different
metrics for performance evaluation, we should be able to improve our
model’s overall predictive power before we roll it out for production on
unseen data.
• Without doing a proper evaluation of the Machine Learning model by using
different evaluation metrics, and only depending on accuracy, can lead to a
problem when the respective model is deployed on unseen data and may
end in poor predictions.
Classification Metrics
• Classification is about predicting the class labels given input data. In
binary classification, there are only two possible output classes(i.e.,
Dichotomy). In multiclass classification, more than two possible
classes can be present. I’ll focus only on binary classification.
• A very common example of binary classification is spam detection,
where the input data could include the email text and metadata
(sender, sending time), and the output label is either “spam” or “not
spam.” (See Figure) Sometimes, people use some other names also
for the two classes: “positive” and “negative,” or “class 1” and “class
0.”
Classification Metrics

• There are many ways for measuring classification performance.


Accuracy, confusion matrix, log-loss, and AUC-ROC are some of the
most popular metrics. Precision-recall is a widely used metrics for
classification problems.
Confusion Matrix
• Confusion Matrix is a performance measurement for the machine
learning classification problems where the output can be two or more
classes. It is a table with combinations of predicted and actual values.

• A confusion matrix is defined as the table that is often used to


describe the performance of a classification model on a set of the test
data for which the true values are known.
Confusion Matrix

• It is extremely useful for measuring the Recall, Precision, Accuracy,


and AUC-ROC curves.
Understanding TP, FP, FN, TN with an example
of pregnancy analogy.
Understanding TP, FP, FN, TN
• True Positive: We predicted positive and it’s true. In the image, we
predicted that a woman is pregnant and she actually is.
• True Negative: We predicted negative and it’s true. In the image, we
predicted that a man is not pregnant and he actually is not.
• False Positive (Type 1 Error)- We predicted positive and it’s false. In
the image, we predicted that a man is pregnant but he actually is not.
• False Negative (Type 2 Error)- We predicted negative and it’s false. In
the image, we predicted that a woman is not pregnant but she
actually is.
1. Accuracy
• Accuracy simply measures how often the classifier correctly predicts.
We can define accuracy as the ratio of the number of correct
predictions and the total number of predictions.
2. Precision
• Precision explains how many of the correctly predicted cases actually
turned out to be positive. Precision is useful in the cases where False
Positive is a higher concern than False Negatives.
• The importance of Precision is in music or video recommendation
systems, e-commerce websites, etc. where wrong results could lead to
customer churn and this could be harmful to the business.
➢Precision for a label is defined as the number of true positives divided
by the number of predicted positives.
3. Recall (Sensitivity)
• Recall explains how many of the actual positive cases we were able to
predict correctly with our model. It is a useful metric in cases where
False Negative is of higher concern than False Positive.
• It is important in medical cases where it doesn’t matter whether we
raise a false alarm but the actual positive cases should not go
undetected!
➢Recall for a label is defined as the number of true positives divided by
the total number of actual positives.
4. F1 Score
• It gives a combined idea about Precision and Recall metrics. It is
maximum when Precision is equal to Recall.
• The F1 score punishes extreme values more.
➢F1 Score is the harmonic mean of precision and recall.
F1 Score
❖F1 Score could be an effective evaluation metric in the following
cases:
• When FP and FN are equally costly.
• Adding more data doesn’t effectively change the outcome
• True Negative is high
4. AUC-ROC
• The Receiver Operator Characteristic (ROC) is a probability curve that
plots the TPR(True Positive Rate) against the FPR(False Positive Rate)
at various threshold values and separates the ‘signal’ from the ‘noise’.
• The Area Under the Curve (AUC) is the measure of the ability of a
classifier to distinguish between classes. From the graph, we simply
say the area of the curve ABDE and the X and Y-axis.
• From the graph shown below, the greater the AUC, the better is the
performance of the model at different threshold points between
positive and negative classes.
AUC-ROC cont’
• This simply means that When AUC is equal to 1, the classifier is able
to perfectly distinguish between all Positive and Negative class points.
• When AUC is equal to 0, the classifier would be predicting all
Negatives as Positives and vice versa.
• When AUC is 0.5, the classifier is not able to distinguish between the
Positive and Negative classes.
AUC-ROC cont’
Working of AUC
• In a ROC curve, the X-axis value shows False Positive Rate (FPR), and
Y-axis shows True Positive Rate (TPR).
• Higher the value of X means higher the number of False Positives(FP)
than True Negatives(TN), while a higher Y-axis value indicates a higher
number of TP than FN.
• So, the choice of the threshold depends on the ability to balance
between FP and FN.
6. Log Loss (bonus metric)
• Log loss (Logistic loss) or Cross-Entropy Loss is one of the major
metrics to assess the performance of a classification problem.
• For a single sample with true label y∈{0,1} and a probability estimate
p=Pr(y=1), the log loss is:
Steps for Evaluating Data
1. Read data
2. Create independent and variable data frame
3. Split the data into trainset and test set
4. Train the model using trainset and different algorithms
5. Evaluate
6. Choose the model with best performance.
The Four Assumptions of Linear Regression
1. Linear relationship: There exists a linear relationship between the
independent variable, x, and the dependent variable, y.
2. Independence: The residuals are independent. In particular, there is
no correlation between consecutive residuals in time series data.
3. Homoscedasticity: The residuals have constant variance at every
level of x.
4. Normality: The residuals of the model are normally distributed.
➢If one or more of these assumptions are violated, then the results of
our linear regression may be unreliable or even misleading.
Linear Regression Estimation
Basic equation

Then,
Conclusion
• Understanding how well a machine learning model will perform on
unseen data is the main purpose behind working with these
evaluation metrics.
• Metrics like accuracy, precision, recall are good ways to evaluate
classification models for balanced datasets, but if the data is
imbalanced then other methods like ROC/AUC perform better in
evaluating the model performance.
• ROC curve isn’t just a single number but it’s a whole curve that
provides nuanced details about the behavior of the classifier. It is also
hard to quickly compare many ROC curves to each other.
Sources
• Evaluation Metrics For Classification Model | Classification Model
Metrics (analyticsvidhya.com)
• www.geeks.com
• www.datascience.com

You might also like