Linear Regression
Linear Regression
Linear regression is a statistical method that aims to model the relationship between a dependent
variable and one or more independent variables by fitting a linear equation to the observed data.
This widely-used technique serves as a fundamental building block in both statistics and machine
learning, providing valuable insights into the relationships between variables.
Basic Concept:
At its core, linear regression assumes a linear relationship between the dependent variable (BP) and
the independent variable(s) (Age, Years). The relationship is expressed through a linear equation:
β0 is the y-intercept
β1 , β2 are coefficients
Objective
The primary goal of linear regression is to determine the best-fitting line, minimizing the sum of
squared differences between the observed and predicted values. In other words, the model aims to
capture the underlying linear relationship between variables while accounting for variability.
Estimation of Coefficients:
The process of finding the optimal coefficients involves using statistical methods such as the least
squares method. The coefficients β0 ,β1 ,...,βn are estimated to create a model that accurately
represents the data. This fitting process provides a mathematical representation of how changes in
the independent variables correlate with changes in the dependent variable.
Linear regression is computationally efficient and can handle large datasets effectively. It can
be trained quickly on large datasets, making it suitable for real-time applications.
Linear regression often serves as a good baseline model for comparison with more complex
machine learning algorithms.
Linear regression is a well-established algorithm with a rich history and is widely available in
various machine learning libraries and software packages.
Linear regression assumes a linear relationship between the dependent and independent
variables. If the relationship is not linear, the model may not perform well.
Linear regression assumes that the features are already in a suitable form for the model.
Feature engineering may be required to transform features into a format that can be
effectively used by the model.
Linear regression is susceptible to both overfitting and underfitting. Overfitting occurs when
the model learns the training data too well and fails to generalize to unseen data.
Underfitting occurs when the model is too simple to capture the underlying relationships in
the data.
Linear Regression and Gradient Descent
The goal of the Linear regression algorithm is to find the best Fit Line equation that can predict the
values based on the independent variables.
In regression set of records are present with X and Y values and these values are used to learn a
function so if you want to predict Y from an unknown X this learned function can be used. In
regression we have to find the value of Y, So, a function is required that predicts continuous Y in the
case of regression given X as independent features.
Our primary objective while using linear regression is to locate the best-fit line, which implies that
the error between the predicted and actual values should be kept to a minimum. There will be the
least error in the best-fit line. The best fit line is determined by finding the coefficients ( β 0 and β1)
that minimize the sum of squared differences between the observed values of the dependent
variable and the values predicted by the regression equation. This process is often achieved through
the method of least squares.
The best Fit Line equation provides a straight line that represents the relationship between the
dependent and independent variables. The slope of the line indicates how much the dependent
variable changes for a unit change in the independent variable(s).
Understanding Relationships: The slope (β1) of the best fit line indicates the strength and direction
of the relationship between the variables. A positive slope suggests a positive correlation, while a
negative slope indicates a negative correlation.
Visual Representation: The best fit line is often plotted on a scatterplot of the data points. It visually
represents the linear trend in the data and helps assess how well the model fits the observed data.
In regression, the difference between the observed value of the dependent variable(yi) and the
predicted value(predicted) is called the residuals.
εi = ypredicted – yi = Ŷ - yi
where ypredicted = Ŷ = ϴ1 + ϴ2 Xi
The cost function helps to work out the optimal values for B0 and B1, which provides the best fit line
for the data points.
In Linear Regression, generally Mean Squared Error (MSE) cost function is used, which is the average
of squared error that occurred between the ypredicted and yi. This is the specified cost function.
Gradient Descent
A regression model optimizes the gradient descent algorithm to update the coefficients of the line by
reducing the cost function by randomly selecting coefficient values and then iteratively updating the
values to reach the minimum cost function.
To update ϴ1 and ϴ2 values in order to reduce the Cost function (minimizing MSE value) and achieve
the best-fit line the model uses Gradient Descent.
MSE vs MAE
Mean Squared Error (MSE) is the average squared error between actual and predicted values. The
mean squared error is calculated by -
MSE should be interpreted as an error metric where the closer your value is to 0, the more accurate
your model is. However, MSE is simply the average of the squared errors, meaning the resulting value
will unfortunately not be understood within the context of your model target.
There is no general rule for how to interpret given MSE values. It is an absolute value which is unique
to each dataset and can only be used to say whether the model has become more or less accurate
than a previous run.
MAE is the average of absolute value between predicted and actual values. The mean absolute error
is calculated by -
The Mean Absolute Error (MAE) serves as an indicator of the accuracy of a predictive model. A lower
MAE suggests a more accurate model. However, it's important to note that the interpretation of MAE
is specific to the scale of the target variable being predicted. Unlike some other metrics, MAE is
returned in the same units as the target variable, making its interpretation dataset-dependent.
Choosing Between MSE and MAE in Specific Scenarios
The key difference between squared error and absolute error is that squared error punishes large
errors to a greater extent than absolute error, as the errors are squared instead of just calculating the
difference.
Let's explore situations where Mean Squared Error (MSE) is more suitable than Mean Absolute Error
(MAE), and vice versa:
1. MSE penalizes larger errors more heavily due to the squaring of differences. If your project is
particularly concerned about minimizing the impact of large errors and is more tolerant of small
errors, MSE may be more appropriate.
2. When using optimization algorithms to train machine learning models, MSE can offer better
numerical stability in certain cases. The squared term often leads to smoother and more well-
behaved optimization landscapes.
3. MSE amplifies the differences between small and large errors. This can be beneficial when you
want a metric that reflects and magnifies the variations in performance, making it easier to
distinguish between models with subtle differences.
1. If your dataset contains outliers and you want the metric to be less influenced by extreme values,
MAE is a better choice at that situation.
2. MAE provides error values in the same units as the target variable, making it more interpretable. If
clear communication of the error in a way that stakeholders can easily understand is crucial, MAE is
often preferred.
YouTube : Ranji raj, code basics