100% found this document useful (1 vote)

106 views21 pages

Regression: UNIT - V Regression Model

This document provides an overview of regression models in machine learning. It discusses simple and multivariable regression, including types of regression like linear, logistic, and polynomial regression. Simple regression involves making predictions using techniques like gradient descent and evaluating model performance. Multivariable regression adds complexity by using matrices and normalization to make predictions while accounting for multiple variables.

Uploaded by

Madhura Pardeshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

106 views21 pages

Regression: UNIT - V Regression Model

Uploaded by

Madhura Pardeshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT -V Regression Model

Contents:

➢ Introduction,
➢ types of regression.
➢ Simple regression- Types, Making predictions, Cost function, Gradient descent, Training,
Model evaluation.
➢ Multivariable regression : Growing complexity, Normalization, Making predictions,
Initialize weights, Cost function, Gradient descent, Simplifying with matrices, Bias term,
Model evaluation

Regression
The term regression is used when you try to find the relationship between
variables.

In Machine Learning, and in statistical modeling, that relationship is used to

predict the outcome of future events.

Regression Analysis in Machine learning

Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specifically, Regression analysis helps us to understand
how the value of the dependent variable is changing corresponding to an independent
variable when other independent variables are held fixed. It predicts continuous/real
values such as temperature, age, salary, price, etc.

We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisement

every year and get sales on that. The below list shows the advertisement made by the
company in the last 5 years and the corresponding sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants
to know the prediction about the sales for this year. So to solve such type of
prediction problems in machine learning, we need regression analysis.

Regression is a supervised learning technique

which helps in finding the correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables. It is mainly used for prediction,
forecasting, time series modeling, and determining the causal-effect relationship
between variables.

In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a way that the vertical
distance between the datapoints and the regression line is minimum." The
distance between datapoints and line tells whether a model has captured a strong
relationship or not.

Some examples of regression can be as:

o Prediction of rain using temperature and other factors

o Determining Market trends
o Prediction of road accidents due to rash driving.
Terminologies Related to the Regression Analysis:
o Dependent Variable: The main factor in Regression analysis which we want to predict
or understand is called the dependent variable. It is also called target variable.
o Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.

Why do we use Regression Analysis?

As mentioned above, Regression analysis helps in the prediction of a continuous
variable. There are various scenarios in the real world where we need some future
predictions such as weather condition, sales prediction, marketing trends, etc., for such
case we need some technology which can make predictions more accurately. So for
such case we need Regression analysis which is a statistical method and used in
machine learning and data science. Below are some other reasons for using Regression
analysis:

o Regression estimates the relationship between the target and the independent
variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.
Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all
the regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-
axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
o The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the basis
of the year of experience.

o Below is the mathematical equation for Linear regression:

1. Y= aX+b

Here, Y = dependent variables (target variables),

X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates

o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex cost
function. This sigmoid function is used to model the data in logistic regression. The
function can be represented as:

o f(x)= Output between the 0 and 1 value.

o x= input to the function
o e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:
o It uses the concept of threshold levels, values above the threshold level are rounded
up to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:

o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)

Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between the value
of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those datapoints.
To cover such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into polynomial
features of given degree and then modeled using a linear model. Which means
the datapoints are best fitted using a polynomial line.
o The equation for polynomial regression also derived from linear regression equation
that means Linear regression equation Y= b0+ b1x, is transformed into Polynomial
regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x
is our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic

Note: This is different from Multiple Linear regression in such a way that in
Polynomial regression, a single element has different degrees instead of multiple
variables with the same degree.
Support Vector Regression:
Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression problems,
then it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for continuous

variables. Below are some keywords which are used in Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into higher dimensional

data.
o Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it
is a line which helps to predict the continuous variables and cover most of the
datapoints.
o Boundary line: Boundary lines are the two lines apart from hyperplane, which creates
a margin for datapoints.
o Support vectors: Support vectors are the datapoints which are nearest to the
hyperplane and opposite class.

In SVR, we always try to determine a hyperplane with a maximum margin, so that

maximum number of datapoints are covered in that margin. The main goal of SVR is
to consider the maximum datapoints within the boundary lines and the
hyperplane (best-fit line) must contain a maximum number of datapoints.
Consider the below image:

Here, the blue line is called hyperplane, and the other two lines are known as boundary
lines.

Decision Tree Regression:

o Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and
each leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those
nodes. Consider the below image:
Above image showing the example of Decision Tee regression, here, the model is
trying to predict the choice of a person between Sports cars or Luxury car.

o Random forest is one of the most powerful supervised learning algorithms which is
capable of performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of each tree
output. The combined decision trees are called as base models, and it can be
represented more formally as:

g(x)= f0(x)+ f1(x)+ f2(x)+....

o Random forest uses Bagging or Bootstrap Aggregation technique of ensemble

learning in which aggregated decision tree runs in parallel and do not interact with
each other.
o With the help of Random Forest regression, we can prevent Overfitting in the model
by creating random subsets of the dataset.
Ridge Regression:
o Ridge regression is one of the most robust versions of linear regression in which a small
amount of bias is introduced so that we can get better long term predictions.
o The amount of bias added to the model is known as Ridge Regression penalty. We
can compute this penalty term by multiplying with the lambda to the squared weight
of each individual features.
o The equation for ridge regression will be:

o A general linear or polynomial regression will fail if there is high collinearity between
the independent variables, so to solve such problems, Ridge regression can be used.
o Ridge regression is a regularization technique, which is used to reduce the complexity
of the model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.
Lasso Regression:
o Lasso regression is another regularization technique to reduce the complexity of the
model.
o It is similar to the Ridge Regression except that penalty term contains only the absolute
weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso regression will be:

Linear Regression-

In Machine Learning,
• Linear Regression is a supervised machine learning algorithm.
• It tries to find out the best linear relationship that describes the data you have.
• It assumes that there exists a linear relationship between a dependent variable
and independent variable(s).
• The value of the dependent variable of a linear regression model is a continuous
value i.e. real numbers.

Representing Linear Regression Model-

Linear regression model represents the linear relationship between a dependent
variable and independent variable(s) via a sloped straight line.
The sloped straight line representing the linear relationship that fits the given data best is
called as a regression line.
It is also called as best fit line.

Selection Criteria(Linear Regression)

1. Classification and regression capabilities: Predicts the continuous
variable (For example-Temperature of a place)
2. Data quality: Each missing point removes one data point that could
optimize the Regression.
3. Computational complexity: Linear Regression is not always
computationally expensive than the decision tree or the clustering
algorithm.
4. Comprehensible and Transparent: Linear Regression is easily
understandable, and a simple mathematical notation can represent
transparency.

Types of Linear Regression-

Based on the number of independent variables, there are two types of linear regression-
1. Simple Linear Regression
2. Multiple Linear Regression

1. Simple Linear Regression-

In simple linear regression, the dependent variable depends only on a single

independent variable.

For simple linear regression, the form of the model is-

Y = β0 + β1X

Here,
• Y is a dependent variable.
• X is an independent variable.
• β0 and β1 are the regression coefficients.
• β0 is the intercept or the bias that fixes the offset to a line.
• β1 is the slope or weight that specifies the factor by which X has an impact on Y.

There are following 3 cases possible-

Case-01: β1 < 0

• It indicates that variable X has negative impact on Y.

• If X increases, Y will decrease and vice-versa.
Case-02: β1 = 0

• It indicates that variable X has no impact on Y.

• If X changes, there will be no change in Y.

Case-03: β1 > 0

• It indicates that variable X has positive impact on Y.

• If X increases, Y will increase and vice-versa.
2. Multiple Linear Regression-

In multiple linear regression, the dependent variable depends on more than one
independent variables.

For multiple linear regression, the form of the model is-

Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn

Here,
• Y is a dependent variable.
• X1, X2, …., Xn are independent variables.
• β0, β1,…, βn are the regression coefficients.
• βj (1<=j<=n) is the slope or weight that specifies the factor by which X j has an
impact on Y.

Understanding Linear Regression(example)

First of all, we need to have some data set to design the model.

Let us say the data is as below

x y
1 3
2 4
3 2
4 4
5 5

The values given are actual values.

Based on the above matters, the graph that most closely fits is as below

y=mx+c, where m is the slope of the line and c is Y-intercept.

From now on x(mean) is referred as x(m) and y(mean) as y(m).

m as per least square method=∑(x-x(m))(y-y(m))/∑(x-x(m))2

As per above data table, x(m)=3, y(m)=3.6.

x y x-x(m) y-y(m) (x-x(m))2 (y-y(m))2

1 3 -2 -0.6 4 1.2
2 4 -1 0.4 1 -0.4
3 2 0 -1.6 0 0
4 4 1 0.4 1 0.4
5 5 2 1.4 4 2.8
As per the equation of m, its value is m=4/10=0.4,c=2.4, so that the line
equation would be y=0.4x+2.4.

x-x(m) is the distance of all the points x through the line y=3.

y-y(m) is the distance of all the points y through the line x=3.6.

Now we will calculate the predicted values of y based on the equation

y=mx+c, where m=0.4 and c=2.4.

For x=1,y=0.4*1+2.4=2.8

For x=2,y=0.4*2+2.4=3.2
For x=3,y=0.4*3+2.4=3.6

For x=4,y=0.4*4+2.4=4.0

For x=5,y=0.4*5+2.4=4.4

Now we have actual values and predicted values of y; we need to calculate

the distance between them and then reduce them, which means we need to
reduce the error, and finally, the line with the minor error would be the line
of Regression best fit line.

Finding the best fit line:

For different values of m, we need to calculate the line equation, where

y=mx+c as the value of m changes, the equation changes. After every
iteration, the predicted value changes according to the line’s equation. It
needs to compare with the actual value and the importance of m for which
the minimum difference gives the best fit line.

Let’s check the goodness of fit:

To test how good our model is performing, we have a method called the R
Square method

R square method

This method is based on a value called the R-Squared value. It measures

how close the data is to the regression line—and also known as the
coefficient of determination.
Source: Author

To check our model’s good, we need to compare the distance between the
actual value and mean versus the distance between the predicted value and
mean; here comes the R formula.

R2=∑(yp-y(m))2/∑(y-y(m))2

If the value of R2 is nearer to 1, then the model is more effective

If the value of R2 is far away from 1, then the model is least effective

x y y-y(m) (y-y(m))2 yp (yp-y(m))2

1 3 -0.6 0.36 2.8 -0.8
2 4 0.4 0.16 3.2 -0.4
3 2 -1.6 2.56 3.6 0
4 4 0.4 0.16 4.0 0.4
5 5 1.4 1.96 4.4 0.8
R2=1.6/5.2=0.3
This means that the data points are far away from the regression line.

If the value of R is 1, then the actual data points would be on the regression
line.

Conclusion

We have covered all the topics related to Linear Regression. And we also
found the effectiveness of the model using the R square method. For
example, R-value might come close to 1 if the data is regarding a company’s
sales. R-value might be too low if the information is from a doctor in
psychology since different persons have different characters. So the
conclusion is if the R-value is closer to one, the more accurate is the
predicted value.

4.EnhancingCybersecurityMeasuresforRobustFraudDetectionandPreventioninU.S.onlineBanking
No ratings yet
4.EnhancingCybersecurityMeasuresforRobustFraudDetectionandPreventioninU.S.onlineBanking
18 pages
Application of Machine Learning Techniques On Traffic Data For Customer's Segmentation, Churn Prediction and Customer's Lifetime Value Evaluation
No ratings yet
Application of Machine Learning Techniques On Traffic Data For Customer's Segmentation, Churn Prediction and Customer's Lifetime Value Evaluation
113 pages
PDF - 23 Dr.+J.+Sabitha
No ratings yet
PDF - 23 Dr.+J.+Sabitha
20 pages
Doc3 Main Report
No ratings yet
Doc3 Main Report
60 pages
Thesis Full 1 To 6
No ratings yet
Thesis Full 1 To 6
138 pages
Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)
No ratings yet
Disease Prediction Using Machine Learning: V. Sharon Rose (Urk18Cs178)
31 pages
Selecting Mutual Funds Using Machine Learning Clas
No ratings yet
Selecting Mutual Funds Using Machine Learning Clas
48 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
33 pages
Machine-Learning-Driven-Optimization-of-Battery-Materials-via-Quantum-Computing
No ratings yet
Machine-Learning-Driven-Optimization-of-Battery-Materials-via-Quantum-Computing
17 pages
Zom Sh Ict701
No ratings yet
Zom Sh Ict701
10 pages
Thera Bank Loan Purchase Modelling
No ratings yet
Thera Bank Loan Purchase Modelling
44 pages
HOUSE-PRICE-PREDICTION-Shreya-Majumder
No ratings yet
HOUSE-PRICE-PREDICTION-Shreya-Majumder
22 pages
Prediction of Anemia Using Machine Learning Algorithms
No ratings yet
Prediction of Anemia Using Machine Learning Algorithms
16 pages
Ensemble Methods_ Bagging, Boosting and Stacking _ by Joseph Rocca _ Towards Data Science
No ratings yet
Ensemble Methods_ Bagging, Boosting and Stacking _ by Joseph Rocca _ Towards Data Science
20 pages
Deep Learning para Previsao de Faltantes A Consultas
No ratings yet
Deep Learning para Previsao de Faltantes A Consultas
10 pages
ML QB final
No ratings yet
ML QB final
16 pages
Hybrid Decision Tree-Based Machine Learning Models For Short-Term Water Quality Prediction.
No ratings yet
Hybrid Decision Tree-Based Machine Learning Models For Short-Term Water Quality Prediction.
14 pages
University of California Los Angeles
No ratings yet
University of California Los Angeles
45 pages
Big Mart Sales Analysis
No ratings yet
Big Mart Sales Analysis
4 pages
AIML-UNIT-3
No ratings yet
AIML-UNIT-3
17 pages
Logistic Regression Tutorial
No ratings yet
Logistic Regression Tutorial
25 pages
02-03 ASAP Business Analytics-2 Descriptive Statistics
No ratings yet
02-03 ASAP Business Analytics-2 Descriptive Statistics
109 pages
Book Summary
No ratings yet
Book Summary
35 pages
Xie-2023-Automatic Identification of Individual Nanoplastics by Raman Spectroscopy Based On Machine Learning
No ratings yet
Xie-2023-Automatic Identification of Individual Nanoplastics by Raman Spectroscopy Based On Machine Learning
12 pages
Ppdac Cycle Template
No ratings yet
Ppdac Cycle Template
2 pages
Camm 4e Ch01 PPT
No ratings yet
Camm 4e Ch01 PPT
48 pages
Stress Detection Report
No ratings yet
Stress Detection Report
11 pages
Demand Forcasting: Product Coca Cola
100% (1)
Demand Forcasting: Product Coca Cola
10 pages
Data Wrangling Python.
No ratings yet
Data Wrangling Python.
8 pages
Mobile App Success Prediction
No ratings yet
Mobile App Success Prediction
8 pages
Solar PV Module Fault Classification Using Artificial Intelligence and Machine Learning Techniques
No ratings yet
Solar PV Module Fault Classification Using Artificial Intelligence and Machine Learning Techniques
18 pages
Regression Analysis
100% (2)
Regression Analysis
11 pages
THE RISE OF INTELLIGENT AUTOMATION 21197-hbr-pulsesurvey
No ratings yet
THE RISE OF INTELLIGENT AUTOMATION 21197-hbr-pulsesurvey
12 pages
The Relationship Between The Entrepreneurial Personality and The Big Five Personality Traits PDF
No ratings yet
The Relationship Between The Entrepreneurial Personality and The Big Five Personality Traits PDF
6 pages
Customer_Churn_Prediction_Report
No ratings yet
Customer_Churn_Prediction_Report
4 pages
In-Class Practices - Session 1 - Answers
No ratings yet
In-Class Practices - Session 1 - Answers
19 pages
Stock Market Forecasting Using Deep Learning and Technical Analysis A Systematic Review
No ratings yet
Stock Market Forecasting Using Deep Learning and Technical Analysis A Systematic Review
11 pages
A Novel Unified Handover Algorithm For LTE-A
No ratings yet
A Novel Unified Handover Algorithm For LTE-A
5 pages
MUNAR - Linear Regression - Ipynb - Colaboratory
No ratings yet
MUNAR - Linear Regression - Ipynb - Colaboratory
30 pages
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
100% (1)
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
37 pages
Diabetes Pridiction Using Machine Learning
No ratings yet
Diabetes Pridiction Using Machine Learning
31 pages
Predictive Analytics Siegel en 27852
No ratings yet
Predictive Analytics Siegel en 27852
7 pages
Production and Operation Management
No ratings yet
Production and Operation Management
16 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
DOI: Http://ijmer - In.doi./2022/11.02.39: Digital Certificate of Publication
No ratings yet
DOI: Http://ijmer - In.doi./2022/11.02.39: Digital Certificate of Publication
6 pages
A B Testing
100% (1)
A B Testing
28 pages
Hypothesis Testing Results Analysis Using SPSS RM Dec 2017
No ratings yet
Hypothesis Testing Results Analysis Using SPSS RM Dec 2017
66 pages
(U E in - Ppa) Notes
100% (1)
(U E in - Ppa) Notes
28 pages
One Sample Test of Hypotensis
No ratings yet
One Sample Test of Hypotensis
24 pages
Logistic Regression Model - A Review
No ratings yet
Logistic Regression Model - A Review
5 pages
A Refresher On Regression Analysis
No ratings yet
A Refresher On Regression Analysis
1 page
Linear Algebra For Business Analytics
No ratings yet
Linear Algebra For Business Analytics
27 pages
Session 18 Time Series Forecasting
No ratings yet
Session 18 Time Series Forecasting
30 pages
Real Options BV Lec 14
No ratings yet
Real Options BV Lec 14
49 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Arch Model and Time-Varying Volatility
No ratings yet
Arch Model and Time-Varying Volatility
17 pages
10.1 Time Series Analysis Sales Forecast
No ratings yet
10.1 Time Series Analysis Sales Forecast
7 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Fundamentals of Business Statistics - Hypothesis
No ratings yet
Fundamentals of Business Statistics - Hypothesis
25 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
No ratings yet
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
44 pages
Automobile
No ratings yet
Automobile
15 pages
Multi-Criteria Decision Making
No ratings yet
Multi-Criteria Decision Making
5 pages
Data Transformation and Arima Models A S
No ratings yet
Data Transformation and Arima Models A S
8 pages
Statistics in Marketing
No ratings yet
Statistics in Marketing
7 pages
Demand Forcasting
No ratings yet
Demand Forcasting
24 pages
Statistic Sample Question Iiswbm
No ratings yet
Statistic Sample Question Iiswbm
15 pages
An Introduction To T
No ratings yet
An Introduction To T
7 pages
Introduction To Business Analytics
No ratings yet
Introduction To Business Analytics
13 pages
Fundamental of Retailing
No ratings yet
Fundamental of Retailing
1 page
Regression
No ratings yet
Regression
46 pages
Applied Regression Analysis: Third Edition
0% (1)
Applied Regression Analysis: Third Edition
9 pages
Statistical Forcasting - Excel, ARIMA
No ratings yet
Statistical Forcasting - Excel, ARIMA
14 pages
C Boe Taxes and Investing
No ratings yet
C Boe Taxes and Investing
27 pages
SSRN Id3177534 PDF
No ratings yet
SSRN Id3177534 PDF
11 pages
BA Module 4 Summary
No ratings yet
BA Module 4 Summary
3 pages
International Business RESIT Summative Assessment Brief
No ratings yet
International Business RESIT Summative Assessment Brief
12 pages
2003 Makipaa 1
No ratings yet
2003 Makipaa 1
15 pages
Predictive Analytics - Unit 4 - Week 2 - Questions
No ratings yet
Predictive Analytics - Unit 4 - Week 2 - Questions
3 pages
Intermittent Demand Forecasting
No ratings yet
Intermittent Demand Forecasting
6 pages
OR Syllabus
No ratings yet
OR Syllabus
2 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Cluster Training PDF (Compatibility Mode)
No ratings yet
Cluster Training PDF (Compatibility Mode)
21 pages
The AI Healthcare Revolution: Integrating Technology for Enhanced Patient Solutions
From Everand
The AI Healthcare Revolution: Integrating Technology for Enhanced Patient Solutions
T.C. Catz
No ratings yet
Build Your Fortune in the Fifth Era: How to Prosper in an Age of Unprecedented Innovation
From Everand
Build Your Fortune in the Fifth Era: How to Prosper in an Age of Unprecedented Innovation
Matthew C Le Merle
No ratings yet
Quant Developers' Tools and Techniques: Quant Books, #2
From Everand
Quant Developers' Tools and Techniques: Quant Books, #2
Manfred Hindering
No ratings yet
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
From Everand
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
Michael Gilliland
No ratings yet
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Regression: UNIT - V Regression Model

Uploaded by

Regression: UNIT - V Regression Model

Uploaded by

UNIT -V Regression Model

In Machine Learning, and in statistical modeling, that relationship is used to

Regression Analysis in Machine learning

Example: Suppose there is a marketing company A, who does various advertisement

Regression is a supervised learning technique

Some examples of regression can be as:

o Prediction of rain using temperature and other factors

Why do we use Regression Analysis?

o Below is the mathematical equation for Linear regression:

Here, Y = dependent variables (target variables),

Some popular applications of linear regression are:

o Analyzing trends and sales estimates

o f(x)= Output between the 0 and 1 value.

There are three types of logistic regression:

Support Vector Regression is a regression algorithm which works for continuous

o Kernel: It is a function used to map a lower-dimensional data into higher dimensional

In SVR, we always try to determine a hyperplane with a maximum margin, so that

Decision Tree Regression:

g(x)= f0(x)+ f1(x)+ f2(x)+....

o Random forest uses Bagging or Bootstrap Aggregation technique of ensemble

Representing Linear Regression Model-

Selection Criteria(Linear Regression)

Types of Linear Regression-

1. Simple Linear Regression-

In simple linear regression, the dependent variable depends only on a single

For simple linear regression, the form of the model is-

There are following 3 cases possible-

• It indicates that variable X has negative impact on Y.

• It indicates that variable X has no impact on Y.

• It indicates that variable X has positive impact on Y.

For multiple linear regression, the form of the model is-

Understanding Linear Regression(example)

Let us say the data is as below

The values given are actual values.

y=mx+c, where m is the slope of the line and c is Y-intercept.

From now on x(mean) is referred as x(m) and y(mean) as y(m).

m as per least square method=∑(x-x(m))(y-y(m))/∑(x-x(m))2

As per above data table, x(m)=3, y(m)=3.6.

x y x-x(m) y-y(m) (x-x(m))2 (y-y(m))2

Now we will calculate the predicted values of y based on the equation

Now we have actual values and predicted values of y; we need to calculate

Finding the best fit line:

For different values of m, we need to calculate the line equation, where

Let’s check the goodness of fit:

This method is based on a value called the R-Squared value. It measures

If the value of R2 is nearer to 1, then the model is more effective

x y y-y(m) (y-y(m))2 yp (yp-y(m))2

You might also like