Regression 0
Regression 0
What is intelligence ?
It can be defined as the ability to comprehend; to understand and profit
from experience.
• Learning –
– A computer program is said to learn from
• experience E
• with respect to some class of tasks T
• and performance measure P
– if its performance at tasks in T , as measured by P , improves with experience
E.” (Mitchell , 1997)
LEARNING ALGORITHMS…
LEARNING ALGORITHMS…
• General Tasks
– Classification, Regression, Transcription , Machine Translation etc.
• Performance measures
– Depends on the type of problem: Examples include –
• accuracy, error rate etc.
– Performance is measured on a dataset called test dataset, that is different
from the dataset used to train the algorithms.
– Often difficult to choose a performance measure that corresponds well to the
desired behavior of the system.
• Experience
– Algorithms are termed as supervised learning or unsupervised learning
algorithms based on the experience they are allowed to have on datasets.
EXAMPLE (HANDWRITING RECOGNITION LEARNING PROBLEM)
EXAMPLE (HANDWRITING RECOGNITION LEARNING PROBLEM)
Machine
Learning
Data
algorithm
Training
Prediction
Source: https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-
machine-learning-b9e651e1ed9d
SUPERVISED MACHINE LEARNING
SUPERVISE APPROACH
MACHINE LEARNING APPROACH
Classification Regression
Regression Classification
UNSUPERVISED
EXAMPLES & USE CASES
UNSUPERVISED MACHINE LEARNING APPROACH
Source: https://www.quora.com/How-is-association-rule-
compared-with-collaborative-filtering-in-recommender-systems
DIMENSION REDUCTION METHOD
Association Rule
Clustering Dimension Reduction
Mining
DBSCAN Eclat
UNSUPERVISED EXAMPLE & USE CASES
REINFORCEMENT LEARNING
Source: https://citrusbits.com/killer-deep-learning-softwares/
MACHINE LEARNING VS DEEP LEARNING
Problem Solving
REGRESSION
SUPERVISED
SUPERVISED LEARNING
LEARNING
Learning a discrete function- classification
algorithm attempt to estimate the mapping
function from the input variables to
discrete or categorical output variables
Classification Regression
Source: https://in.springboard.com/blog/regression-vs-classification-in-machine-learning/
SUPERVISED LEARNING
Regression
Dataset
Map x y
Identify
Relationship
SALARY AFTER COMPLETING THE COURSE
Dependent Variable
A variable whose value changes when there is any manipulation in the
values of independent variable. It is often denoted by Y
CASE STUDY: PREDICTING HOUSE PRICE
CASE STUDY: PREDICTING HOUSE PRICE
Regression
Dataset
BIVARIATE AND MULTIVARIATE MODEL
Size of house X1
# of bedrooms X2 Y Price
Age of house X3
SIMPLE/BIVARIATE LINEAR REGRESSION
It concerns two-dimensional sample points with one independent variable and one
dependent variable and finds a linear function (a non-vertical straight line) that, as
accurately as possible, predicts the dependent variable values as a function of the
independent variables.
The adjective simple refers to the fact that the outcome variable is related to a
single predictor.
HOW MUCH IS MY HOUSE WORTH?
LOOK AT RECENT SALES IN MY NEIGHBORHOOD
𝒙(𝒊) 𝑦 (𝑖)
𝒙(𝒊) 𝑦 (𝑖)
𝒙(𝒊) 𝑦 (𝑖)
𝒙(𝒊)
𝑦 (𝑖)
REGRESSION (HOUSE PRICE PREDICTION) Scatter plot is a mathematical diagram to
display values of two variables for a set of data.
Dependent Variable
Price of house is dependent
Variable/response variable
Independent Variable
The equation that describe how dependent variable (y) is related to independent
variable (x). The equation is referred as a regression equation.
𝑦 = 𝑚𝑥 + 𝑐
Regression Equation
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
Unknown 𝜽𝟎 , 𝜽𝟏
Sample Data
𝜽𝟎 , 𝜽𝟏 are known
(x, y)
Estimated
Regression Equation
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
GOAL OF REGRESSION MODEL
Our goal to learn the model parameters that minimize error in the
model’s prediction.
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
𝒚(𝒊)
House price (y)
𝒉𝜽 (𝒙(𝒊) )
𝒉𝜽 (𝒙(𝒊) )
𝒚(𝒊)
𝑦 (𝑖) − ℎ𝜃 (𝑥 (𝑖) )
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
𝒚(𝒊)
House price (y)
ℎ𝜃 (𝑥 (𝑖) ) − 𝑦 (𝑖)
𝒉𝜽 (𝒙(𝒊) )
𝒉𝜽 (𝒙(𝒊) )
𝒚(𝒊)
Parameter :
Regression coefficient
Hθ(x) =
EFFECTS OF PARAMETERS ON LINE PLACEMENT
𝒉𝜽 𝒙 = 𝟏. 𝟓 + 𝟎 ∗ 𝒙 x y
3
𝒉𝜽 𝒙 = 𝟎 + 𝟎. 𝟓 ∗ 𝒙 1 1
𝒉𝜽 𝒙 = 𝟏 + 𝟎. 𝟓 ∗ 𝒙
2 2
3 3
2
1
0
0 1 2 3
EFFECTS OF PARAMETERS ON LINE PLACEMENT
𝒉𝜽 𝒙 = 𝟏. 𝟓 + 𝟎 ∗ 𝒙
3 x y
𝒉𝜽 𝒙 = 𝟎 + 𝟎. 𝟓 ∗ 𝒙
𝒉𝜽 𝒙 = 𝟏 + 𝟎. 𝟓 ∗ 𝒙 1 1
2 2
2
3 3
1
Example
Suppose x = 2.5
0
ℎ𝜃 𝑥 = 1 + 0.5 ∗ 𝑥
0 1 2 3
Size of
house (x)
LEAST SQUARE METHOD
𝜀𝑖 = 𝑦 𝑖 − ℎ𝜃 (𝑥 𝑖 )
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖
− ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 J(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1 Cost Function
EXAMPLE
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖
− ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1
2
2
𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙
𝑱(𝜃1 )
y
1
1
𝜽𝟏 =1
0
0
0 1 2
0 1 2
𝜽𝟏
x
1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2(02 + 02 ) = 0
EXAMPLE
𝒉𝜽 𝒙 , for fixed 𝜃1 , this is a 𝑱(𝜃1 ) is a function of 𝜃1
function of x
3
𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙
2
2
y
𝑱(𝜃1 )
1
𝜽𝟏 =1.5
1
0
0
0 1 2
0 1 2
𝜽𝟏
x
1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2((1 − 1.5)2 +(2 − 3)2 ) = 0.5
EXAMPLE
𝑱(𝜃1 ) is a function of 𝜃1
𝒉𝜽 𝒙 , for fixed 𝜃1 , this is a
function of x
2
2
𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙
𝑱(𝜃1 )
y
1
1
𝜽𝟏 =.75
0
0
0 1 2
0 1 2
𝜽𝟏
x
1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2((1 − 0.75)2 +(2 − 1.5)2 ) = 0.07
COST FUNCTION SURFACE PLOT
CONTOUR PLOT
Hypothesis ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
Parameters 𝜃0 , 𝜃1
1
Cost Function J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1
Slope of change is 0
g′′(𝑧) ≥ 0
g(z)
𝑔′′ 𝑧 < 0
Slope of change is 0
a b a b
Slope of change
is 0
Example
g(𝑧) = 5 − (𝑧 − 10)2
𝑑(𝑔(𝑧)
= 0 − 2 𝑧 − 10
𝑑𝑧
= -2z + 20
Set 𝑑(𝑔(𝑧)Τ𝑑𝑧 = 0
z = 10
FINDING MAXIMUM VIA HILL CLIMBING
Derivative = 0
𝑑𝑔(𝜃)
>0 -ve
𝑑𝜃 slope While not converged
𝑑𝑔(𝜃)
+ve 𝑑𝑔(𝜃) 𝜃 𝑡+1 ← 𝜃 𝑡 + α
<0 𝑑𝜃
slope 𝑑𝜃
iteration
Step Size
θ θ
Max(g(θ))
FINDING MINIMUM VIA HILL DESCENT
Min(g(θ)
When derivative is positive, we want to decrease
𝜃 and when derivative is negative, we want to
θ θ
increase 𝜃
𝑑𝑔(𝜃) 𝑑𝑔(𝜃)
<0 >0
𝑑𝜃 𝑑𝜃
-ve +ve
slope slope
𝛽
α𝑡 =
𝑡
CONVERGENCE CRITERIA
𝑑𝑔 𝜃
=0
𝑑𝜃
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
Cost Function J(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1
COMPUTE THE GRADIENT
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1
𝑖 (𝑖)
ℎ𝜃 (𝑥 ) = 𝜃0 + 𝜃1 𝑥
𝑚
1
𝐽(𝜃0 , 𝜃1 ) = (𝑦 𝑖 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
2𝑚
𝑖=1
𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= (𝑦 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
𝜕𝜃0 2𝑚
𝑖=1
𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= (𝑦 − (𝜃0 +𝜃1 𝑥 (𝑖) ))2
𝜕𝜃1 2𝑚
𝑖=1
𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= (𝑦 − ℎ𝜃 (𝑥 (𝑖) ))2
𝜕𝜃0 2𝑚
𝑖=1
𝑚
1
= (𝑦 𝑖 −(𝜃0 + 𝜃1 𝑥 𝑖
))(−1)
𝑚
𝑖=1
𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= (𝑦 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
𝜕𝜃1 2𝑚
𝑖=1
𝑚
1
= (𝑦 𝑖
− (𝜃0 + 𝜃1 𝑥 (𝑖) )) . (−𝑥 (𝑖) )
𝑚
𝑖=1
COMPUTE THE GRADIENT
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1
Putting it together
1 𝑚
− σ𝑖=1[𝑦 𝑖 −(𝜃0 +𝜃1 𝑥 𝑖 )]
𝛻J(𝜃0 , 𝜃1 ) = 𝑚
−
1
σ𝑚
2𝑚 𝑖=1
[𝑦 𝑖 −(𝜃 +𝜃 𝑥 (𝑖) )
0 1 ] . (𝑥 (𝑖) )
APPROACH 1 : SET GRADIENT = 0
1 𝑚
− σ𝑖=1[𝑦 𝑖 −(𝜃0 +𝜃1 𝑥 𝑖 )]
𝛻J(𝜃0 , 𝜃1 ) = 𝑚
−
1
σ𝑚
2𝑚 𝑖=1
[𝑦 𝑖 −(𝜃 +𝜃 𝑥 (𝑖) )
0 1 ] . (𝑥 (𝑖) )
Top Term
σ𝑚 𝑦 𝑖 𝜃1 σ𝑚 𝑥 𝑖
𝜃0 = 𝑖=1
− 𝑖=1
𝑚 𝑚
Bottom Term
1 𝑖 2
− σ𝑦 𝑖 𝑥 𝑖
− 𝜃0 σ 𝑥 𝑖
− 𝜃1 σ 𝑥 =0
2𝑚
σ 𝑦 𝑖 σ𝑥 𝑖
σ𝑦 𝑖 𝑥 𝑖 −
𝜃1 = 2 σ 𝑦
𝑚
𝑖 σ𝑥 𝑖
σ𝑥 𝑖 −
𝑚
Note
𝑦 𝑖 𝑥 𝑖 𝑥 𝑖
𝑥 𝑖 2 𝑦 𝑖
QUESTION 1
Find the least square regression line, for the following data.
Also estimate the value of y when x = 10
X Y
0 2
1 3
2 5
3 4
4 6
SOLUTION
ℎ𝜃 𝑥 = 2.2 + 0.9 𝑥
x = 10
ℎ𝜃 𝑥 = 2.2 + 0.9𝑥
= 11.2
APPROACH 2: GRADIENT DESCENT
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1
Outlines:
Start with some 𝜃0 , 𝜃1
Keep changing 𝜃0 , 𝜃1 to reduce J(𝜃0 , 𝜃1 ) until we hopefully
end up at a minimum.
𝑊ℎ𝑖𝑙𝑒 𝑛𝑜𝑡 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑
{
𝑓𝑜𝑟 𝑗 = 0 𝑡𝑜 1
𝜕𝐽(𝜃0 ,𝜃1 )
𝜃𝑗 = 𝜃𝑗 − 𝛼
𝜕𝜃𝑗
}
GRADIENT DESCENT ALGORITHM
𝜕𝐽(𝜃1 )
Slope of the <0
line is -ve 𝜕𝜃1
𝜃1
𝜃1 = 𝜃1 − 𝛼 −𝑣𝑒 𝑣𝑎𝑙𝑢𝑒
Increase the value of 𝜽𝟏 with some quantity
GRADIENT DESCENT ALGORITHM
𝜕𝐽(𝜃1 )
>0
𝐽(𝜃1 ) 𝜕𝜃1 Slope of the
line is +ve
𝜃1
𝜃1 = 𝜃1 − 𝛼 +𝑣𝑒 𝑣𝑎𝑙𝑢𝑒
Decrease the value of 𝜽𝟏 with some quantity
GRADIENT DESCENT ALGORITHM
𝜕𝐽(𝜃1 )
Slope of the =0
𝜕𝜃1
line is 0
𝜃1 = 𝜃1 − 𝛼 ∗ 0
No change
GRADIENT DESCENT ALGORITHM
𝑊ℎ𝑖𝑙𝑒 𝑛𝑜𝑡 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑
{
𝑚
1 𝑖 𝑖
𝜃0 = 𝜃0 + 𝛼 (𝑦 −(ℎ𝜃 (𝑥 ))
𝑚
𝑖=1
𝑚
1 𝑖 𝑖 𝑖
𝜃1 = 𝜃1 + 𝛼 (𝑦 − ℎ𝜃 (𝑥 )𝑥
𝑚
𝑖=1
}
LINEAR REGRESSION WITH GRADIENT DESCENT
Linear Regression
with
Gradient Descent Algorithm Gradient descent
𝑆𝑆𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 = 𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
2
𝑦𝑖 − 𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙
𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = 𝑦𝑖 − 𝑦ത 2
𝑆𝑆𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 =
2
𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 − 𝑦ത
𝑦ത
Intercept
EXAMPLE
Regression Line
X Y SS_Total Y = 6x
SS_Regression
-5
0 0 169 -5 5 25
1 1 144 1 0 0
2 4 81 7 -3 9
3 9 16 13 -4 16
4 16 9 19 -3 9
5 25 144 25 0 0
6 36 529 31 5 25
Average 13
Total 1092 84
R-squared
0.923
Source: http://www.fairlynerdy.com/what-is-r-squared/