chapter 8

Uploaded by

radhekrishna85411

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

chapter 8

Uploaded by

radhekrishna85411

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

Chapter-8

Supervised Learning: Regression

REGRESSION
• Regression is essentially finding a relationship
(or) association between the dependent variable
(Y) and the independent variable(s) (X), i.e. to
find the function ‘f ’ for the association

Y = f (X).
COMMON REGRESSION ALGORITHMS
• The most common regression algorithms are
• Simple linear regression
• Multiple linear regression
• Polynomial regression
• Multivariate adaptive regression splines
• Logistic regression
• Maximum likelihood estimation (least squares)
Simple Linear Regression
• involves only one predictor.
• This model assumes a linear relationship
between the dependent variable and the
predictor variable.
Simple linear regression
Slope of the simple linear regression model

• Slope = Change in Y/Change in X

• Rise is the change in Y-axis (Y − Y ) and Run is
the change in X-axis (X − X ). So, slope is
represented as given below:
Types of slopes
• There can be two types of slopes in a linear
regression model:
– positive slope
– negative slope
• Different types of regression lines based on the
type of slope include:
– Linear positive slope
– Curve linear positive slope
– Linear negative slope
– Curve linear negative slope
Linear positive slope

• A positive slope always moves upward on a

graph from left to right.

Slope = Rise/Run = (Y − Y ) / (X − X ) = Delta (Y) / Delta(X)

Linear positive slope

• Scenario 1 for positive slope: Delta (Y) is

positive and Delta (X) is positive

• Scenario 2 for positive slope: Delta (Y) is

negative and Delta (X) is negative
Curve linear positive slope

Slope for a variable (X) may vary between two graphs,

but it will always be positive; hence, the above graphs
are called as graphs with curve linear positive slope.
Linear negative slope
• A negative slope always moves downward on a graph
from left to right.
• As X value (on X-axis) increases, Y value decreases

Scenario 1 for negative slope: Delta (Y) is positive and Delta (X) is negative
Scenario 2 for negative slope: Delta (Y) is negative and Delta (X) is positive
Curve linear negative slope

Curves in these graphs slope downward from left to

right.
No relationship graph

it is very difficult to conclude whether the relationship between

X and Y is positive or negative.
Error in simple regression
• X and Y values are provided to the machine, and it
identifies the values of a (intercept) and b (slope) by
relating the values of X and Y.
• However, identifying the exact match of values for a
and b is not always possible.
• There will be some error value (ɛ) associated with it.
• This error is called marginal or residual error:
Y = (a + bX) + ε
(how to decide the parameters of the model (i.e.
values of a and b) for a given problem)
Example of simple regression
A college professor believes that if the grade for
internal examination is high in a class, the grade
for external examination will also be high. A
random sample of 15 students in that class was
selected, and the data is given below:
• A scatter plot was drawn to explore the relationship
between the independent variable (internal marks)
mapped to X-axis and dependent variable (external
marks) mapped to Y-axis.

Scatter plot and regression line

• The regression line does not predict the data
exactly. Instead, it just cuts through the data.
• Some predictions are lower than expected,
while some others are higher than expected.
• Residual is the distance between the predicted
point (on the regression line) and the actual
point
Residual error

In simple linear regression, the line is drawn using the regression

formula: Y = (a + bX) + ε
Finding the values of the parameters, a,b
• If we know the values of ‘a’ and ‘b’, then it is easy to
predict the value of Y for any given X by using the
formula Y = (a + bX) + ε
• But the question is how to calculate the values of ‘a’
and ‘b’ for a given set of X and Y values? A straight line
is drawn as close as possible over the points
• Ordinary Least Squares (OLS) is the technique used to
estimate a line that will minimize the error (ε), which is
the difference between the predicted and the actual
values of Y.
• Sum of the Squares of the Errors (SSE) is least when b
takes the value
Finding the value of a from b
Finding a & b
• Sum of X = 299
• Sum of Y = 852
• Mean X, Mx = 19.93
• Mean Y, My = 56.8

a = M − bM = 56.8 − (1.89 × 19.93) = 19.0473

ŷ = 1.89395X + 19.0473
Multiple Linear Regression
• In a multiple regression model, two or more independent
variables, i.e. predictors are involved in the model.
• Ŷ = a + b1 X1 + b2 X2 (two predictor variables, namely X1 and
X2)
• The model describes a plane in the three-dimensional space of
Ŷ, X1 , and X2 .
• ‘a’ : intercept of this plane.
• ‘b 1’ and ‘b2 ’ : partial regression coefficients.
• Parameter b1 represents the change in the mean response
corresponding to a unit change in X1 when X2 is held constant.
• Parameter b2 represents the change in the mean response
corresponding to a unit change in X2 when X1 is held constant.
Polynomial Regression Model
• Polynomial regression model is the extension
of the simple linear model by adding extra
predictors obtained by raising (squaring) each
of the original predictors to a power.

f(x) = c0 + c1 .X1 + c2 .X2 + c3 .X3

• c0 , c1 , c2 are the coefficients

Logistic Regression
• Logistic Regression Equation
• The odd is the ratio of something occurring to
something not occurring.
• it is different from probability as the
probability is the ratio of something occurring
to everything that could possibly occur.
• so odd will be: (p(x)/1-p(x))=ez
• Logistic regression is the appropriate regression analysis to
conduct when the dependent variable is dichotomous (binary).
• Like all regression analyses, logistic regression is a predictive
analysis.
• It is used to describe data and to explain the relationship
between one dependent binary variable and one or more
nominal, ordinal, interval or ratio-level independent variables.
• The outcome can either be yes or no (2 outputs).
• This regression technique is similar to linear regression and can
be used to predict the Probabilities for classification problems.
Odds are nothing but the ratio of the probability of success and
probability of failure. odds can always be positive which means the
range will always be (0,+∞ )
Log of odds
• The problem here is that the range is restricted and
we don’t want a restricted range because if we do so
then our correlation will decrease.
• By restricting the range we are actually decreasing the
number of data points and of course, if we decrease
our data points, our correlation will decrease.
• It is difficult to model a variable that has a restricted
range. To control this we take the log of odds which
has a range from (-∞,+∞).
•Multiply by exponent on both sides and then solve
for P.
logistic function, also called a sigmoid function

x = input value
y = predicted output
b0 = bias or intercept term
b1 = coefficient for input (x)
Differences Between Linear and Logistic Regression

• The graph of a sigmoid function squeezes a

straight line into an S-curve.
Differences Between Linear and Logistic
Regression
• The core difference lies in their target predictions.
• Linear regression excels at predicting continuous values along a
spectrum.
• Example: predicting house prices based on size and location – the
resulting output would be a specific dollar amount, a continuous value
on the price scale.
• Logistic regression, deals with categories. Doesn’t predict a specific
value but rather the likelihood of something belonging to a particular
class.
• Example: classifying emails as spam (category 1) or not spam (category
0). The output here would be a probability between 0 (not likely spam)
and 1 (very likely spam).
• This probability is then used to assign an email to a definitive category
(spam or not spam) based on a chosen threshold.
Key properties of the logistic regression equation

• Sigmoid Function: uses a special “S” shaped curve to predict

probabilities. It ensures that the predicted probabilities stay between
0 and 1, which makes sense for probabilities.
• Straightforward Relationship: the relationship between our inputs
(like age, height, etc.) and the outcome (like yes/no) is pretty simple to
understand. It’s like drawing a straight line, but with a curve instead.
• Coefficients: These are just numbers that tell us how much each input
affects the outcome in the logistic regression model. For example, if
age is a predictor, the coefficient tells us how much the outcome
changes for every one year increase in age.
• Probabilities, Not Certainties: Instead of saying “yes” or “no” directly,
logistic regression gives us probabilities, like saying there’s a 70%
chance it’s a “yes” in the logistic regression model. We can then
decide on a cutoff point to make our final decision.
Implementation of Logistic Regression using Python

• Step-1 (import libraries)

# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix, roc_curve, auc
• Step -2 Read and Explore the data
# Load the diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
# Convert the target variable to binary (1 for
diabetes, 0 for no diabetes)
y_binary = (y > np.median(y)).astype(int)
(it converts the binary representation of the
continuous target variable y. A patient’s diabetes
measure is classified as 1 (indicating diabetes) if it
is higher than the median value, and as 0 (showing
no diabetes).)
Step-3:Splitting The Dataset: Train and Test dataset

# Split the data into training and testing sets

X_train, X_test, y_train, y_test =
train_test_split(
X, y_binary, test_size=0.2, random_state=42)
Step4: Feature Scaling

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Step5: Train The Model
Step6: Evaluation Metrics

# Train the Logistic Regression model

model = LogisticRegression()
model.fit(X_train, y_train)
# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Leishman Principles of Helicopter Aerodynamics
91% (22)
Leishman Principles of Helicopter Aerodynamics
539 pages
Hansen p4 Qvrc2 Cun 9
100% (1)
Hansen p4 Qvrc2 Cun 9
21 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
ML unit-2 ppt
No ratings yet
ML unit-2 ppt
34 pages
CH 6
No ratings yet
CH 6
42 pages
Regression
No ratings yet
Regression
24 pages
Week+12+Presentation
No ratings yet
Week+12+Presentation
99 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
9 Regression Analysis
No ratings yet
9 Regression Analysis
38 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
STA2100-Regression Analysis
No ratings yet
STA2100-Regression Analysis
15 pages
Lecture 9-10_Regression and Classification cognitive
No ratings yet
Lecture 9-10_Regression and Classification cognitive
61 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
UNIT 2
No ratings yet
UNIT 2
79 pages
-1
No ratings yet
-1
51 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Regression Analysis
100% (1)
Regression Analysis
43 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
LinearRegression
No ratings yet
LinearRegression
24 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
No ratings yet
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
110 pages
ml-unit-3-notes-1
No ratings yet
ml-unit-3-notes-1
58 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
No ratings yet
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
21 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
50 pages
Logistic - Poly Regression
No ratings yet
Logistic - Poly Regression
13 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Regression Analysis
No ratings yet
Regression Analysis
34 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
Bivariate Statistical
No ratings yet
Bivariate Statistical
51 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
Unit No. 2
No ratings yet
Unit No. 2
30 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Unit 3
No ratings yet
Unit 3
25 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
ML-UNIT-IV - Complete
No ratings yet
ML-UNIT-IV - Complete
42 pages
Lecture+8+ +Linear+Regression
No ratings yet
Lecture+8+ +Linear+Regression
45 pages
Topics: Regression
No ratings yet
Topics: Regression
26 pages
ML_Lec_2
No ratings yet
ML_Lec_2
32 pages
Lesson 4 Linear Assumptions
No ratings yet
Lesson 4 Linear Assumptions
27 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Lecture 5-Association Between Variables-1
No ratings yet
Lecture 5-Association Between Variables-1
20 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
No ratings yet
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
19 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
HPM-RM Series
No ratings yet
HPM-RM Series
1 page
Finding The Rate When The Percentage and Base
100% (1)
Finding The Rate When The Percentage and Base
19 pages
Speeds & Feeds Worksheet
No ratings yet
Speeds & Feeds Worksheet
46 pages
Seggpt Paper
No ratings yet
Seggpt Paper
12 pages
Modelling Turbidity Currents in Reservoirs - Sloff - 1994
No ratings yet
Modelling Turbidity Currents in Reservoirs - Sloff - 1994
140 pages
ALP BSOSR6x04 EN
No ratings yet
ALP BSOSR6x04 EN
6 pages
Fineness Modulus of Sand
67% (3)
Fineness Modulus of Sand
3 pages
Mario Kiessling, Syed Aon Mujtaba
No ratings yet
Mario Kiessling, Syed Aon Mujtaba
5 pages
HNDIT 1103-Structured Programming Q1. (2007 First Semester Exam - Q 1)
No ratings yet
HNDIT 1103-Structured Programming Q1. (2007 First Semester Exam - Q 1)
5 pages
Impacto de Saccharomyces Boulardii, Sibo.2023
No ratings yet
Impacto de Saccharomyces Boulardii, Sibo.2023
20 pages
Orani-G Course
100% (1)
Orani-G Course
440 pages
TPS54528 4.5-V To 18-V Input, 5-A Synchronous Step-Down Converter With Eco-Mode™
No ratings yet
TPS54528 4.5-V To 18-V Input, 5-A Synchronous Step-Down Converter With Eco-Mode™
26 pages
10 Little Fingers
No ratings yet
10 Little Fingers
119 pages
The Assessment of Maximal Respiratory Mouth Pressures in Adults
No ratings yet
The Assessment of Maximal Respiratory Mouth Pressures in Adults
12 pages
Part II-Chapter 1-API-RP 5 72
No ratings yet
Part II-Chapter 1-API-RP 5 72
11 pages
LIST OF PRACTICALS CS CLASS XII CS
No ratings yet
LIST OF PRACTICALS CS CLASS XII CS
4 pages
Componon S4-0 80
No ratings yet
Componon S4-0 80
5 pages
(Advances in Intelligent Systems and Computing 744) Thi Thi Zin, Jerry Chun-Wei Lin - Big Data Analysis and Deep Learning Applications-Springer Singapore (2019)
No ratings yet
(Advances in Intelligent Systems and Computing 744) Thi Thi Zin, Jerry Chun-Wei Lin - Big Data Analysis and Deep Learning Applications-Springer Singapore (2019)
388 pages
Fisa Tehnica Centrala de Incendiu Adresabila UniPOS IFS7002!2!1
No ratings yet
Fisa Tehnica Centrala de Incendiu Adresabila UniPOS IFS7002!2!1
2 pages
Joint Design For Reinforced Concrete Buildings: P Ei Er
No ratings yet
Joint Design For Reinforced Concrete Buildings: P Ei Er
79 pages
Mimi Dan Susi DM. 2022. Pengaruh Pelayanan, Pengawasan Dan Pemeriksaan Pajak THD Kepatuhan WP Dimoderasi Digitalisasi Administrasi Perpajakan
No ratings yet
Mimi Dan Susi DM. 2022. Pengaruh Pelayanan, Pengawasan Dan Pemeriksaan Pajak THD Kepatuhan WP Dimoderasi Digitalisasi Administrasi Perpajakan
18 pages
Understanding Negative Eigenvalue Messages
No ratings yet
Understanding Negative Eigenvalue Messages
2 pages
Pericoli Ventilation Pap 07 2019
No ratings yet
Pericoli Ventilation Pap 07 2019
16 pages
Physics Investigatory Project
No ratings yet
Physics Investigatory Project
15 pages
Circuits - Lab - Handbook RA ZJSU
No ratings yet
Circuits - Lab - Handbook RA ZJSU
8 pages
Examen Pinoybix - Org-Forouzan MCQ in Transmission Media
No ratings yet
Examen Pinoybix - Org-Forouzan MCQ in Transmission Media
10 pages
SR Physics - Narayana Imp Questions
100% (10)
SR Physics - Narayana Imp Questions
5 pages