0% found this document useful (0 votes)
20 views

2. Linear Regression, Polynomical, Gradiant Descent

The document provides an overview of regression in machine learning, detailing various types of regression techniques such as linear, polynomial, and logistic regression. It discusses concepts such as univariate and multivariate regression, performance measures, gradient descent, and the bias-variance trade-off. Additionally, it outlines methods to reduce bias and variance in models to improve their predictive accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

2. Linear Regression, Polynomical, Gradiant Descent

The document provides an overview of regression in machine learning, detailing various types of regression techniques such as linear, polynomial, and logistic regression. It discusses concepts such as univariate and multivariate regression, performance measures, gradient descent, and the bias-variance trade-off. Additionally, it outlines methods to reduce bias and variance in models to improve their predictive accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Regression

Regression in machine learning consists of mathematical methods


that allow data scientists to predict a continuous outcome (y) based
on the value of one or more predictor variables (x).
Ex.

Predicting the salary of an


employee on the basis of the year
of experience.
Types of Regression Techniques/Models

There are some regression models

 Linear Regression
 Polynomial Regression
 Ridge Regression
 Lasso Regression
 Logistic Regression
Univariate and Multivariate
• to predict a single value that is univariate
• to predict multiple values , that would be a multivariate
regression problem.
Simple Linear Regression
Ex.
Size in feet(X) Price (Y)

2104 460

1416 232

1534 315

852 178

Notation :
m= Number of Training example
X’s=Input variable/Features
Y’s=Output variable/target Variable
Simple Linear Model

General Equation for Linear Regression Model is written as:


Simple Linear Model
Hypothesis function in Linear Regression
Hypothesis function in Linear Regression
Linear regression model prediction (vectorized
form)
Best Fit Line
• The best Fit Line equation provides a straight line that represents
the relationship between the dependent and independent
variables.
• primary objective while using linear regression is to locate the
best-fit line, which implies that the error between the predicted
and actual values should be kept to a minimum
Performance Measures
1. Root mean square error (RMSE)

2. Mean absolute error (MAE


Cost function
Cost function is the error or difference between prediction value and
actual values
Gradient Descent
Gradient descent is a generic optimization algorithm capable of
finding optimal solutions to a wide range of problems.

Idea : Start by filling θ with random values (this is called random


initialization). Then improve it gradually, taking one baby step at a
time, each step attempting to decrease the cost function (e.g., the MSE),
until the algorithm converges to a minimum .
Gradient Descent
Step Size and Learning Rate
• If the learning rate is too small, then the algorithm will have to go
through many iterations to converge, which will take a long time
• If the learning rate is too high, you might jump across the valley
and end up on the other side, possibly even higher up than you
were before.
Step Size and Learning Rate
Local and Global Minima
Step Size and Learning Rate
• If the learning rate is too small, then the algorithm will have to go
through many iterations to converge, which will take a long time
• If the learning rate is too high, you might jump across the valley
and end up on the other side, possibly even higher up than you
were before.
Types of Gradient Descent
• Batch Gradient Descent
• Stochastic Gradient Descent
• Mini-Batch Gradient Descent
Batch Gradient Descent
In Batch Gradient Descent, all the training data is taken into
consideration to take a single step.
Stochastic Gradient Descent

In Stochastic Gradient Descent (SGD), we consider just one


example at a time to take a single step.
The algorithm much faster because it has very little data to
manipulate at every iteration.
Mini Batch Gradient Descent

MinibatchGD computes the gradients on small random sets of


instances called minibatches. The main advantage of mini-batch
GD over stochastic GD is that you can get a performance boost
from hardware optimization of matrix operations, especially when
using GPUs.
Polynomial Regression

• Add powers of each


feature as new features,
then train a linear model
on this extended set of
features. This technique is
called polynomial
regression.
Polynomial Regression

• Add powers of each


feature as new features,
then train a linear model
on this extended set of
features. This technique is
called polynomial
regression.
Overfitting and Underfitting

This high-degree polynomial


regression model is severely
overfitting the training data,
while the linear model is
underfitting it.
Learning Curve

• learning curves, which are


plots of the model’s training
error and validation error as a
function of the training
iteration.
Learning Curve

• Overfitting of model
• Note: One way to improve an
overfitting model is to feed it
more training data until the
validation error reaches the
training error.
Cross Validation

• Cross validation is a technique used


in machine learning to evaluate the
performance of a model on unseen
data. It involves dividing the
available data into multiple folds or
subsets, using one of these folds as
a validation set, and training the
model on the remaining folds.
The bias/variance trade-off

The model’s generalization error can be expressed as the sum of


three very different errors:
Bias
Variance
Irreducible error
What is Bias?
• This part of the generalization error is due to wrong assumptions
• Being high in biasing gives a large error in training as well as testing
data. It recommended that an algorithm should always be low-biased
to avoid the problem of underfitting.
• By high bias, the data predicted is in a straight line format, thus not
fitting accurately in the data in the data set. Such fitting is known as
the Underfitting of Data.
• This happens when the hypothesis is too simple or linear in nature.
Graph given below for an example of such a situation.

In such a problem, a hypothesis looks like follows


A model has either of the two situations:
• Low bias – Low bias value implies fewer assumptions have been
made to build the target function. In this scenario, the model will
closely match the training dataset.
• High bias – High bias value implies more assumptions have been
made to build the target function. In this scenario, the model will not
match the dataset closely.
A high-bias model will be unable to capture the dataset trend. It has a
high error rate and is considered an underfitting model. This happens
because of a very simplified algorithm. For instance, a linear regression
model might be biased if the data has a non-linear relationship.
Ways To Reduce High Bias
Since we have discussed some disadvantages of having high bias, here
are some ways to reduce high bias in machine learning.
• Use a complex model: The extremely simplified model is the main
cause of high bias. It is incapable of capturing the data complexity. In
such scenarios, the model can be made more complex.
• Increase the training data size: Increasing the training data size can
help reduce bias. This is because the model is being provided with
more examples to learn from the dataset.
• Increase the features: Increasing the number of features will
increase the complexity of the model. This improves the ability of
the model to capture the underlying data patterns.
• Reduce regularisation of the model: L1 and L2 regularisation can
help prevent overfitting and improve the model’s generalisation
ability. Reducing the regularisation or removing it completely can
help improve the performance.
What is Variance?
• This part is due to the model’s excessive sensitivity to small variations
in the training data. A model with many degrees of freedom (such as a
highdegree polynomial model) is likely to have high variance and thus
overfit the training data.
• When a model is high on variance, it is then said to as Overfitting of
Data.
• Overfitting is fitting the training set accurately via complex curve and
high order hypothesis but is not the solution as the error with unseen
data is high.
• While training a data model variance should be kept low.
The high variance data looks as follows.

In such a problem, a hypothesis looks like


follows.
Variance error is either low or high:
• Low variance: Low variance implies that the ML model is less
sensitive to changes in the training data. The model will be able to
produce consistent estimates for the target function using different
data subsets of the same distribution. This is underfitting, where the
model can’t generalize on test and training data.
• High variance: High variance implies that the ML model is
susceptible to changes in the training data. When trained on various
subsets of data from the same distribution, the ML model can
significantly change the target function estimation. This scenario is
known as overfitting when the ML model does well on the training
data but not on any new data.
Ways To Reduce High Variance
Here are some ways high variance can be reduced:
• Simplifying the model: Decreasing the number of parameters of
neural network layers can help reduce the complexity of the model.
This, in turn, helps in reducing the variance of the model.
• Ensemble methods: Boosting, stacking and bagging are common
ensemble techniques that can help reduce the variance of an ML
model and improve the generalisation performance.
• Early stopping: This is a technique used for preventing overfitting by
putting a stop to the deep learning model training when the validation
set performance stops improving.
Bias Variance Tradeoff
• If the algorithm is too simple (hypothesis with linear equation) then it
may be on high bias and low variance condition and thus is error-
prone.
• If algorithms fit too complex (hypothesis with high degree equation)
then it may be on high variance and low bias.
• In the latter condition, the new entries will not perform well. Well,
there is something between both of these conditions, known as a
Trade-off or Bias Variance Trade-off.
• This tradeoff in complexity is why there is a tradeoff between bias and
variance. An algorithm can’t be more complex and less complex at the
same time.
1.Low-Bias, Low-Variance:
The combination is an ideal machine learning model. However, it is not
possible practically.
2.Low-Bias, High-Variance: This is a case of overfitting where model
predictions are inconsistent and accurate on average. The predicted
values will be accurate(average) but will be scattered.
3.High-Bias, Low-Variance: This is a case of underfitting where
predictions are consistent but inaccurate on average. The predicted
values will be inaccurate but will be not scattered.
4.High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also
inaccurate on average.

You might also like