0% found this document useful (0 votes)

3 views

Regression 0

The document discusses the origins and definitions of machine learning, highlighting its roots in statistics, computer science, and neuroscience. It explains the different types of machine learning, including supervised, unsupervised, and reinforcement learning, along with their respective algorithms and applications. Additionally, it covers regression analysis as a method for predicting outcomes based on independent variables, using examples like house price prediction.

Uploaded by

ARSH SINHA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Regression 0

Uploaded by

ARSH SINHA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 108

MACHINE LEARNING

WHY MACHINE LEARNING WAS INTRODUCED

 Statistics: How to efficiently train large complex models?

 Computer Science & Artificial Intelligence: How to train more

robust version of the AI system.

 Neuroscience: How to design operational models of the brain?

CAN YOU RECOGNIZE THESE PICTURES ?

If yes, How do you Recognize it ?

ORIGIN OF MACHINE LEARNING

……… Lies in very effort of understanding Intelligence

What is intelligence ?
It can be defined as the ability to comprehend; to understand and profit
from experience.

Capability of acquire and apply knowledge

LEARNING?
LEARNING?
2300
2300 YEARS
YEARS AGO……
AGO……

 Plato (427 – 347 BC)

 The concept of abstract ideas are known
to us a priori, through a Mystic
connection with world.
 He conclude that ability to think is found
in a priori knowledge of the concept
LEARNING ?
LEARNING ?
 Plato’s Pupil
 Aristotle (384 – 322 BC)
 Criticized his Teacher’s Theory
as it is not taking into account the
important aspect
An ability to learn or adapt to
changing world.
MACHINE LEARNING
MACHINE LEARNING
 Machine Learning is a subset of AI technique which use statistical
methods to enable machines to improve with experience.

• Learning –
– A computer program is said to learn from
• experience E
• with respect to some class of tasks T
• and performance measure P
– if its performance at tasks in T , as measured by P , improves with experience
E.” (Mitchell , 1997)
LEARNING ALGORITHMS…
LEARNING ALGORITHMS…
• General Tasks
– Classification, Regression, Transcription , Machine Translation etc.

• Performance measures
– Depends on the type of problem: Examples include –
• accuracy, error rate etc.
– Performance is measured on a dataset called test dataset, that is different
from the dataset used to train the algorithms.
– Often difficult to choose a performance measure that corresponds well to the
desired behavior of the system.

• Experience
– Algorithms are termed as supervised learning or unsupervised learning
algorithms based on the experience they are allowed to have on datasets.
EXAMPLE (HANDWRITING RECOGNITION LEARNING PROBLEM)
EXAMPLE (HANDWRITING RECOGNITION LEARNING PROBLEM)

 Task T: Recognition and classifying handwritten words within images

 Performance Measure P: Percentage of words correctly classified.

 Training experience E: A database of handwritten words with given

classification
MACHINE LEARNING

• Learning from experience on data to make predictions.

Machine
Learning
Data
algorithm

Training
Prediction

Unseen Trained Prediction

Data model
BRANCHES OF MACHINE
LEARNING

Source: https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-
machine-learning-b9e651e1ed9d
SUPERVISED MACHINE LEARNING
SUPERVISE APPROACH
MACHINE LEARNING APPROACH

 For each specific tasks

 We collect lots of examples with their known outcomes
 Learn a function that map inputs to outputs
 These programs tend to be data centric, i.e. driven by the learning
examples and tries to learn a preconceived hypothesis function that
can describe the mapping as close as possible.
SUPERVISED MACHINE LEARNING APPROACH

We collect lots of examples with their

known outcomes
Learn a function that map inputs to
outputs

Supervised Learning models are trying to find

parameter values that will allow them to
perform well on historical data. Then they
are used for making predictions on unknown
data, that was not a part of training dataset.
There are two main problems that can be solved with Supervised
Learning:

Classification Regression
Regression Classification

Linear Regression Logistic Regression

Multiple Linear Regression K-Nearest Neighbors

Polynomial Linear Regression Support Vector Machine

Support Vector Regression Naïve Bayes

Decision Tree Regression Decision Tree Classification

Random Forest Regression Random Forest Classification

SUPERVISED EXAMPLE & USE CASES

UNSUPERVISED
EXAMPLES & USE CASES
UNSUPERVISED MACHINE LEARNING APPROACH

 Finding patterns in data

 Draw inferences from non-labeled data (without reference to
known or labeled outcomes).
 Models based on this type of algorithms can be used for
discovering unknown data patterns and data structure itself.
CLUSTERING
ASSOCIATION RULE MINING

Source: https://www.quora.com/How-is-association-rule-
compared-with-collaborative-filtering-in-recommender-systems
DIMENSION REDUCTION METHOD
Association Rule
Clustering Dimension Reduction
Mining

K-Means Aprior PCA

Hierarchical FP-Growth LDA

DBSCAN Eclat
UNSUPERVISED EXAMPLE & USE CASES
REINFORCEMENT LEARNING

 Reinforcement learning is a type of machine learning where an agent learns to behave

in a environment by performing actions and seeing the results.
 Exploration (Trail and Error)
 Exploitation (Knowledge gained from the environment)
DEEP LEARNING

• The difference in artificial intelligence approaches over

the two decades (1997-2017)
– 1997: The IBM chess computer DeepBlue, was explicitly
programmed to win against the grandmaster Garry Kasparov in
1997
– 2017: AlphaGo was not preprogrammed to play Go.
– It learned using a general-purpose algorithm that allowed it to
interpret the game’s patterns.
• AlphaGo program applied deep learning.
DEEPDEEP LEARNING
LEARNING

 Deep learning is a new area of Machine Learning research,

which has been introduced with the objective of moving
machine learning closer to concept of its original goal:
Artificial Intelligence.

 It is inspired by the functionality of our brain cells called

neurons which led to the concept of artificial neural network
DEEP LEARNING
DEEP LEARNING

Source: https://citrusbits.com/killer-deep-learning-softwares/
MACHINE LEARNING VS DEEP LEARNING

Deep Learning IS Machine Learning

Data Dependency Hardware Requirement Execution time

Feature Engineering Interpretability

Problem Solving
REGRESSION
SUPERVISED
SUPERVISED LEARNING
LEARNING
Learning a discrete function- classification
algorithm attempt to estimate the mapping
function from the input variables to
discrete or categorical output variables

Learning a continuous function- regression

algorithm attempt to estimate the mapping
function from the input variables to
numeric or continuous output variables
CLASSIFICATION VS REGRESSION

Classification Regression
Source: https://in.springboard.com/blog/regression-vs-classification-in-machine-learning/
SUPERVISED LEARNING

Image Source: https://www.javatpoint.com/supervised-machine-learning

WHAT IS REGRESSION
WHAT IS REGRESSION

 It is used to predict target variables on a continuous scale.

Regression

Dataset

Map x  y
Identify
Relationship
SALARY AFTER COMPLETING THE COURSE

How much will your salary be ?

Depends on x = performance in course, quality of projects, etc….
TWEET POPULARITY

 How many people will retweet your tweet? (y)

 Depends on x = # followers, # of followers of followers, features of text tweeted,

popularity of hashtag, # of past retweets…….
REGRESSION ANALYSIS

 Regression Analysis is a statistical tool for investigating the

relationship between a dependent variable and one or more
independent variables/explanatory variable.

 Regression analysis is widely used for prediction and

forecasting
INDEPENDENT AND DEPENDENT VARIABLE

 Independent Variable (Explanatory Variable):

A variable whose value does not change by the effect of other variables and
is used to manipulate the dependent variable/target variable. It is often denoted
by X

 Dependent Variable
A variable whose value changes when there is any manipulation in the
values of independent variable. It is often denoted by Y
CASE STUDY: PREDICTING HOUSE PRICE
CASE STUDY: PREDICTING HOUSE PRICE

Size of house (ft) is independent variable also

known as control variable

Price of house is dependent variable/response

variable
WHAT IS REGRESSION
CASE STUDY: PREDICTING HOUSE PRICE

Regression

Dataset
BIVARIATE AND MULTIVARIATE MODEL

 Bivariate or simple regression model

Size of house X Y Price

 Multivariate or multiple regression model

Size of house X1

# of bedrooms X2 Y Price
Age of house X3
SIMPLE/BIVARIATE LINEAR REGRESSION

 Simple linear regression is a linear regression model with a single explanatory

variable.

 It concerns two-dimensional sample points with one independent variable and one
dependent variable and finds a linear function (a non-vertical straight line) that, as
accurately as possible, predicts the dependent variable values as a function of the
independent variables.

 The adjective simple refers to the fact that the outcome variable is related to a
single predictor.
HOW MUCH IS MY HOUSE WORTH?
LOOK AT RECENT SALES IN MY NEIGHBORHOOD

 How much did they sell for ?

𝒙(𝒊) 𝑦 (𝑖)

𝒙(𝒊)
𝑦 (𝑖)
REGRESSION (HOUSE PRICE PREDICTION) Scatter plot is a mathematical diagram to
display values of two variables for a set of data.

Size of house (ft) is independent 𝒙 𝒊 , 𝒚𝒊

variable also known as control
variable

Dependent Variable
Price of house is dependent
Variable/response variable

Independent Variable

Scatter plots are used to investigates the position

relationship between the variables
SIMPLE LINEAR REGRESSION
House Price Predication
We want to fit the best line (linear function
Y = f(X)) to explain the data
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION

 The equation that describe how dependent variable (y) is related to independent
variable (x). The equation is referred as a regression equation.
𝑦 = 𝑚𝑥 + 𝑐

 The simple linear regression model is:

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥
• x is independent variable
• Parameters/Regression coefficients are 𝜃0 (intercept) and 𝜃1 (𝑠𝑙𝑜𝑝𝑒)
Represents the relationship
REGRESSION between input
(𝑥) and output (y)
The simple linear regression equation is

House price (y)

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥 𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙

Size of house (x)

1. The regression equation is a straight line

2. 𝜃0 intercept of the regression line
3. 𝜃1 𝑠𝑙𝑜𝑝𝑒 of the regression line
4. ℎ𝜃 𝑥 hypothesis of the model
ESTIMATION PROCESS

Regression Equation
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
Unknown 𝜽𝟎 , 𝜽𝟏

Sample Data
𝜽𝟎 , 𝜽𝟏 are known
(x, y)
Estimated
Regression Equation
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
GOAL OF REGRESSION MODEL

 Our goal to learn the model parameters that minimize error in the
model’s prediction.

𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
𝒚(𝒊)
House price (y)

𝒉𝜽 (𝒙(𝒊) )
𝒉𝜽 (𝒙(𝒊) )

𝒚(𝒊)

Size of house (x)

 To find the best parameters:
 Define the cost function , or loss function that measures how inaccurate our
model’s prediction are.

𝑦 (𝑖) − ℎ𝜃 (𝑥 (𝑖) )
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
𝒚(𝒊)
House price (y)

ℎ𝜃 (𝑥 (𝑖) ) − 𝑦 (𝑖)
𝒉𝜽 (𝒙(𝒊) )
𝒉𝜽 (𝒙(𝒊) )

𝒚(𝒊)

Size of house (x)

SIMPLE LINEAR REGRESSION

Parameter :
Regression coefficient

Hθ(x) =
EFFECTS OF PARAMETERS ON LINE PLACEMENT
𝒉𝜽 𝒙 = 𝟏. 𝟓 + 𝟎 ∗ 𝒙 x y
3
𝒉𝜽 𝒙 = 𝟎 + 𝟎. 𝟓 ∗ 𝒙 1 1
𝒉𝜽 𝒙 = 𝟏 + 𝟎. 𝟓 ∗ 𝒙
2 2
3 3
2
1
0

0 1 2 3
EFFECTS OF PARAMETERS ON LINE PLACEMENT
𝒉𝜽 𝒙 = 𝟏. 𝟓 + 𝟎 ∗ 𝒙
3 x y
𝒉𝜽 𝒙 = 𝟎 + 𝟎. 𝟓 ∗ 𝒙
𝒉𝜽 𝒙 = 𝟏 + 𝟎. 𝟓 ∗ 𝒙 1 1
2 2
2

3 3
1

Example
Suppose x = 2.5
0

ℎ𝜃 𝑥 = 1 + 0.5 ∗ 𝑥
0 1 2 3

Predict the outcome

ℎ𝜃 𝑥 =1 + 0.5 *2.5
= 2.25
ESTIMATION PROCESS

Size of
house (x)
LEAST SQUARE METHOD

 One of the most common estimation

technique for linear regression is Least
Square Estimation.

 The least square method is a statistical

procedure to find the best fit for a set
of data points by minimizing the sum
of the offsets or residuals of points Size of
from plotted curve. house (x)
Least Square Method
𝑖
𝑦 = 𝜃0 + 𝜃1 𝑥 (𝑖) + 𝜀 𝑖

𝜀𝑖 = 𝑦 𝑖 − ℎ𝜃 (𝑥 𝑖 )

is residual error (RSS) in the ith observation

J(𝜃0 , 𝜃1 ) = (𝑦 1 − ℎ𝜃 (𝑥 (1) ))2 +(𝑦 2 − ℎ𝜃 (𝑥 (2) ))2 +(𝑦 3 − ℎ𝜃 (𝑥 (3) ))2

+⋯………….+
𝑖𝑛𝑐𝑙𝑢𝑑𝑖𝑛𝑔 𝑎𝑙𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ℎ𝑜𝑢𝑠𝑒𝑠
So, our aim to minimize the total error.

1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖
− ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 J(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1 Cost Function
EXAMPLE

 Let’s take only one parameters 𝜃1 .

1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖
− ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

 Goal: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽(𝜃1 )

𝜃1
x y
EXAMPLE
1 1
 𝒉𝜽 𝒙 , for fixed 𝜃1 , this is a  𝑱(𝜃1 ) is a function of 𝜃1 2 2
function of x

2
2

𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙

𝑱(𝜃1 )
y

1
1

𝜽𝟏 =1

0
0

0 1 2
0 1 2
𝜽𝟏
x

1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2(02 + 02 ) = 0
EXAMPLE
 𝒉𝜽 𝒙 , for fixed 𝜃1 , this is a  𝑱(𝜃1 ) is a function of 𝜃1
function of x
3

𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙

2
2
y

𝑱(𝜃1 )
1
𝜽𝟏 =1.5
1

0
0

0 1 2
0 1 2
𝜽𝟏
x

1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2((1 − 1.5)2 +(2 − 3)2 ) = 0.5
EXAMPLE
 𝑱(𝜃1 ) is a function of 𝜃1
 𝒉𝜽 𝒙 , for fixed 𝜃1 , this is a
function of x

2
2

𝒉𝜽 𝒙 = 𝜽 𝟏 ∗ 𝒙

𝑱(𝜃1 )
y

1
1

𝜽𝟏 =.75

0
0

0 1 2
0 1 2
𝜽𝟏
x

1 1
J(𝜃0 , 𝜃1 ) = 2𝑚 σ𝑚
𝑖=1(𝑦
𝑖 − 𝜃1 (𝑥 (𝑖) ))2 J(𝜃0 , 𝜃1 ) = 2∗2((1 − 0.75)2 +(2 − 1.5)2 ) = 0.07
COST FUNCTION SURFACE PLOT
CONTOUR PLOT

 Contour plot is also known as level plots.

 It is used to visualized the change in J(𝜃0 ,

𝜃1 ) as a function of two input 𝜃0 and 𝜃1 .
J(𝜃0 , 𝜃1 ) =f(𝜃0 , 𝜃0 )

 For a function f(𝜃0 , 𝜃0 ) of two variables,

assigned different colors to different
values of F.

 Pick some values to plot. The result will

be contours–curves in the graph along
which the values of f(𝜃0 , 𝜃0 ) are constant
EXAMPLE

 𝐽(𝜃0 , 𝜃1 ) (function of the

 ℎ𝜃 𝑥 , for fixed 𝜃0 , 𝜃1 , this is
parameters 𝜃1 , 𝜃1 )
a function of x
EXAMPLE
 ℎ𝜃 𝑥 , for fixed 𝜃0 , 𝜃1 , this is a  𝐽(𝜃0 , 𝜃1 ) (function of the
function of x parameters 𝜃1 , 𝜃1 )
EXAMPLE

 ℎ𝜃 𝑥 , for fixed 𝜃0 , 𝜃1 , this is a  𝐽(𝜃0 , 𝜃1 ) (function of the

function of x parameters 𝜃1 , 𝜃1 )
SUMMARY

Hypothesis ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥

Parameters 𝜃0 , 𝜃1

1
Cost Function J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

Goal 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽(𝜃0 , 𝜃1 )

𝜃0 , 𝜃1
CONVEX AND CONCAVE FUNCTION
Convex Function Concave Function

Slope of change is 0
g′′(𝑧) ≥ 0
g(z)
𝑔′′ 𝑧 < 0

Slope of change is 0

a b a b

Slope of change
is 0
Example
g(𝑧) = 5 − (𝑧 − 10)2
𝑑(𝑔(𝑧)
= 0 − 2 𝑧 − 10
𝑑𝑧
= -2z + 20
Set 𝑑(𝑔(𝑧)Τ𝑑𝑧 = 0
z = 10
FINDING MAXIMUM VIA HILL CLIMBING
Derivative = 0

How do we know whether to move θ to right

or left ?
(Increase the value of θ or decrease θ)

𝑑𝑔(𝜃)
>0 -ve
𝑑𝜃 slope While not converged
𝑑𝑔(𝜃)
+ve 𝑑𝑔(𝜃) 𝜃 𝑡+1 ← 𝜃 𝑡 + α
<0 𝑑𝜃
slope 𝑑𝜃
iteration
Step Size

θ θ
Max(g(θ))
FINDING MINIMUM VIA HILL DESCENT

Min(g(θ)
When derivative is positive, we want to decrease
𝜃 and when derivative is negative, we want to
θ θ
increase 𝜃
𝑑𝑔(𝜃) 𝑑𝑔(𝜃)
<0 >0
𝑑𝜃 𝑑𝜃

-ve +ve
slope slope

While not converged

𝑑𝑔(𝜃)
𝜃 𝑡+1 ← 𝜃 -α
𝑡
𝑑𝜃
iteration
Step Size
STEP SIZE/LEARNING RATE (𝛼)
 With Fixed learning rate

Slowly reach to the optimum

position
STEP SIZE/LEARNING RATE (𝛼)

 With Fixed learning rate

Small step size Large step size

Advantage Advantage
Will converge to global optimum Moving fast toward the optimum
Disadvantage Disadvantage
Slow convergence May overshoot the optimum point
STEP SIZE/LEARNING RATE (𝛼)
 Decreasing Step Size

Step size is scheduled Common Choice:

𝑡
𝛽
α =
𝑡

𝛽
α𝑡 =
𝑡
CONVERGENCE CRITERIA

 For convex function, optimum occurs when

𝑑𝑔 𝜃
=0
𝑑𝜃

In practice, stop when While not converged

𝑑𝑔(𝜃)
𝜃 𝑡+1
← 𝜃 -α
𝑡
𝑑𝑔 𝜃 iteration
𝑑𝜃
<ϵ Step Size
𝑑𝜃
FINDING THE LEAST SQUARES LINE
𝜃0 , 𝜃1

Solution is unique and

+ gradient decent will
converge to minimum

1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
Cost Function J(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1
COMPUTE THE GRADIENT
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

𝑖 (𝑖)
ℎ𝜃 (𝑥 ) = 𝜃0 + 𝜃1 𝑥
𝑚
1
𝐽(𝜃0 , 𝜃1 ) = ෍(𝑦 𝑖 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
2𝑚
𝑖=1

𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= ෍(𝑦 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
𝜕𝜃0 2𝑚
𝑖=1

𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= ෍(𝑦 − (𝜃0 +𝜃1 𝑥 (𝑖) ))2
𝜕𝜃1 2𝑚
𝑖=1
𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= ෍(𝑦 − ℎ𝜃 (𝑥 (𝑖) ))2
𝜕𝜃0 2𝑚
𝑖=1
𝑚
1
= ෍(𝑦 𝑖 −(𝜃0 + 𝜃1 𝑥 𝑖
))(−1)
𝑚
𝑖=1

𝑚
𝜕J(𝜃0 , 𝜃1 ) 1 𝑖
= ෍(𝑦 − (𝜃0 + 𝜃1 𝑥 (𝑖) ))2
𝜕𝜃1 2𝑚
𝑖=1
𝑚
1
= ෍(𝑦 𝑖
− (𝜃0 + 𝜃1 𝑥 (𝑖) )) . (−𝑥 (𝑖) )
𝑚
𝑖=1
COMPUTE THE GRADIENT
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1

Putting it together
1 𝑚
− σ𝑖=1[𝑦 𝑖 −(𝜃0 +𝜃1 𝑥 𝑖 )]
𝛻J(𝜃0 , 𝜃1 ) = 𝑚
−
1
σ𝑚
2𝑚 𝑖=1
[𝑦 𝑖 −(𝜃 +𝜃 𝑥 (𝑖) )
0 1 ] . (𝑥 (𝑖) )
APPROACH 1 : SET GRADIENT = 0
1 𝑚
− σ𝑖=1[𝑦 𝑖 −(𝜃0 +𝜃1 𝑥 𝑖 )]
𝛻J(𝜃0 , 𝜃1 ) = 𝑚
−
1
σ𝑚
2𝑚 𝑖=1
[𝑦 𝑖 −(𝜃 +𝜃 𝑥 (𝑖) )
0 1 ] . (𝑥 (𝑖) )

Top Term

σ𝑚 𝑦 𝑖 𝜃1 σ𝑚 𝑥 𝑖
𝜃0 = 𝑖=1
− 𝑖=1
𝑚 𝑚
Bottom Term

1 𝑖 2
− σ𝑦 𝑖 𝑥 𝑖
− 𝜃0 σ 𝑥 𝑖
− 𝜃1 σ 𝑥 =0
2𝑚

σ 𝑦 𝑖 σ𝑥 𝑖
σ𝑦 𝑖 𝑥 𝑖 −
𝜃1 = 2 σ 𝑦
𝑚
𝑖 σ𝑥 𝑖
σ𝑥 𝑖 −
𝑚
Note

෍𝑦 𝑖 𝑥 𝑖 ෍𝑥 𝑖
෍𝑥 𝑖 2 ෍𝑦 𝑖
QUESTION 1

Find the least square regression line, for the following data.
Also estimate the value of y when x = 10

X Y
0 2
1 3
2 5
3 4
4 6
SOLUTION

ℎ𝜃 𝑥 = 2.2 + 0.9 𝑥

x = 10

ℎ𝜃 𝑥 = 2.2 + 0.9𝑥
= 11.2
APPROACH 2: GRADIENT DESCENT

Gradient descent is an optimization algorithm used to find the values of parameters

(coefficients) of a function (f) that minimizes a cost function (cost).
GRADIENT DESCENT

 Gradient descent algorithm

 Get estimated parameters

 Intercepts
 Slope
 Used to form predictions
1
J(𝜃0 , 𝜃1 ) = σ𝑚 (𝑦 𝑖 − ℎ𝜃 (𝑥 (𝑖) ))2
2𝑚 𝑖=1
Have some function
ℎ𝜃 (𝑥 𝑖 )= 𝜃0 + 𝜃1 𝑥 (𝑖)

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐽(𝜃0 , 𝜃1 )
𝜃0 , 𝜃1

Outlines:
Start with some 𝜃0 , 𝜃1
Keep changing 𝜃0 , 𝜃1 to reduce J(𝜃0 , 𝜃1 ) until we hopefully
end up at a minimum.
𝑊ℎ𝑖𝑙𝑒 𝑛𝑜𝑡 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑
{
𝑓𝑜𝑟 𝑗 = 0 𝑡𝑜 1
𝜕𝐽(𝜃0 ,𝜃1 )
𝜃𝑗 = 𝜃𝑗 − 𝛼
𝜕𝜃𝑗
}
GRADIENT DESCENT ALGORITHM

𝜕𝐽(𝜃1 )
Slope of the <0
line is -ve 𝜕𝜃1

𝜃1
𝜃1 = 𝜃1 − 𝛼 −𝑣𝑒 𝑣𝑎𝑙𝑢𝑒
Increase the value of 𝜽𝟏 with some quantity
GRADIENT DESCENT ALGORITHM

𝜕𝐽(𝜃1 )
>0
𝐽(𝜃1 ) 𝜕𝜃1 Slope of the
line is +ve

𝜃1

𝜃1 = 𝜃1 − 𝛼 +𝑣𝑒 𝑣𝑎𝑙𝑢𝑒
Decrease the value of 𝜽𝟏 with some quantity
GRADIENT DESCENT ALGORITHM

𝜕𝐽(𝜃1 )
Slope of the =0
𝜕𝜃1
line is 0

𝜃1 = 𝜃1 − 𝛼 ∗ 0
No change
GRADIENT DESCENT ALGORITHM
𝑊ℎ𝑖𝑙𝑒 𝑛𝑜𝑡 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑑
{
𝑚
1 𝑖 𝑖
𝜃0 = 𝜃0 + 𝛼 ෍ (𝑦 −(ℎ𝜃 (𝑥 ))
𝑚
𝑖=1

𝑚
1 𝑖 𝑖 𝑖
𝜃1 = 𝜃1 + 𝛼 ෍ (𝑦 − ℎ𝜃 (𝑥 )𝑥
𝑚
𝑖=1
}
LINEAR REGRESSION WITH GRADIENT DESCENT

 Linear Regression Model

𝒉𝜽 𝒙(𝒊) = 𝜽𝟎 + 𝜽𝟏 𝒙(𝒊)
𝟏
J(𝜽𝟎 , 𝜽𝟏 ) = σ𝒎 (𝒚 𝒊 − 𝒉𝜽 (𝒙(𝒊) ))𝟐
𝟐𝒎 𝒊=𝟏

Linear Regression
with
 Gradient Descent Algorithm Gradient descent

𝑾𝒉𝒊𝒍𝒆 𝒏𝒐𝒕 𝒄𝒐𝒏𝒗𝒆𝒓𝒈𝒆𝒅

{
𝒇𝒐𝒓 𝒋 = 𝟎 𝒕𝒐 𝟏
𝝏𝑱(𝜽𝒋 ,𝜽𝒋 )
𝜽𝒋 = 𝜽 𝒋 − 𝜶
𝝏𝜽𝒋
}
GRADIENT DESCENT ALGORITHM
 Types of Gradient Descent Algorithm

 Stochastic gradient descent

 SGD randomly picks one data point from the whole data set at each iteration.

 Batch gradient descent

 Every step of gradient descent uses all the training examples

 Mini-batch gradient descent

 A balance between the goodness of gradient descent and speed of SGD.
 sample a small number of data points instead of just one point at each step.
COEFFICIENT OF DETERMINATION (𝑟 2 )

Quantifies the goodness of a fit.

 𝑟2
 Is a measure of how close each data
point fits to the regression line.

 In other words, it represents the

fraction of variance in dependent
variable (response) that has been
explained by the regression model
 R-Squared is a way of measuring how much better than the mean line
you have done based on summed squared error.
Our objective is to do better than the mean. For instance this regression line will give A
lower sum squared error than using the horizontal line.
Ideally, you would have zero regression error, i.e. Your regression line would perfectly
match the data. In that case you would get an r-squared value of 1
𝐴𝑐𝑡𝑢𝑎𝑙

𝑆𝑆𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 = 𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
2
෍ 𝑦𝑖 − 𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙

𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = ෍ 𝑦𝑖 − 𝑦ത 2

𝑆𝑆𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 =
2
෍ 𝑦𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 − 𝑦ത
𝑦ത

Intercept
EXAMPLE
Regression Line
X Y SS_Total Y = 6x
SS_Regression
-5
0 0 169 -5 5 25
1 1 144 1 0 0
2 4 81 7 -3 9
3 9 16 13 -4 16
4 16 9 19 -3 9
5 25 144 25 0 0
6 36 529 31 5 25
Average 13
Total 1092 84

R-squared
0.923
Source: http://www.fairlynerdy.com/what-is-r-squared/

Lecture Machinelearning
No ratings yet
Lecture Machinelearning
32 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
404-BA-chapter IV
No ratings yet
404-BA-chapter IV
70 pages
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
No ratings yet
Machine Learning For Beginners Overview of Algorithm TypesStart Learning Machine Learning From Here
13 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Day 2. Lecture - Machinelearning
No ratings yet
Day 2. Lecture - Machinelearning
32 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Unit I
No ratings yet
Unit I
44 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
21CSC305P ML_ Unit 1-E.pptx
No ratings yet
21CSC305P ML_ Unit 1-E.pptx
137 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
3171617_introduction_1175
No ratings yet
3171617_introduction_1175
58 pages
Machine Learning and Regression
No ratings yet
Machine Learning and Regression
8 pages
2.0 Machine Learning Introduction
No ratings yet
2.0 Machine Learning Introduction
24 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Intro - Types of Machine Learning
No ratings yet
Intro - Types of Machine Learning
24 pages
Machine Learning Reg
No ratings yet
Machine Learning Reg
45 pages
Aiml 4
No ratings yet
Aiml 4
107 pages
Ai Chapter 5
No ratings yet
Ai Chapter 5
45 pages
Introduction to Ai & ML
No ratings yet
Introduction to Ai & ML
27 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
UNIT 1
No ratings yet
UNIT 1
38 pages
Machine Learning Ppts
No ratings yet
Machine Learning Ppts
38 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Lecture-07 & 08 (New)
No ratings yet
Lecture-07 & 08 (New)
17 pages
CHP 1
No ratings yet
CHP 1
47 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
25 pages
1 ML M1503-Introduction - ABP
No ratings yet
1 ML M1503-Introduction - ABP
14 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
Module 3 (1)
No ratings yet
Module 3 (1)
63 pages
Linear Regression for ML ass
No ratings yet
Linear Regression for ML ass
99 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
ML Notes
No ratings yet
ML Notes
10 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
Comp Vis Week 2
No ratings yet
Comp Vis Week 2
16 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
AI lab6 (1)
No ratings yet
AI lab6 (1)
7 pages
Slide 1
No ratings yet
Slide 1
29 pages
unit 1
100% (1)
unit 1
13 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
2 - Types of Machine Learning
No ratings yet
2 - Types of Machine Learning
26 pages
lec001
No ratings yet
lec001
17 pages
Chapter 01 Introduction to ML
No ratings yet
Chapter 01 Introduction to ML
178 pages
Week 1
No ratings yet
Week 1
9 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Solar Panel Data Analysis and ARIMA Model For Power Generation Prediction
No ratings yet
Solar Panel Data Analysis and ARIMA Model For Power Generation Prediction
22 pages
Sanjeev Mishra
No ratings yet
Sanjeev Mishra
26 pages
FD Rout, 2024
No ratings yet
FD Rout, 2024
5 pages
UNIT-4 Material
No ratings yet
UNIT-4 Material
43 pages
Electricity_Price_Prediction_Based_on_LSTM_and_LightGBM
No ratings yet
Electricity_Price_Prediction_Based_on_LSTM_and_LightGBM
5 pages
Ai Driven Supply Chain Optimization Platform
No ratings yet
Ai Driven Supply Chain Optimization Platform
27 pages
Introduction To Business Analytics: Alka Vaidya Nibm
100% (1)
Introduction To Business Analytics: Alka Vaidya Nibm
41 pages
DP-100 Designing and Implementing A
No ratings yet
DP-100 Designing and Implementing A
12 pages
Medlens SHDS
No ratings yet
Medlens SHDS
88 pages
Early Predicting of Students Performance in Higher
No ratings yet
Early Predicting of Students Performance in Higher
12 pages
Physics+Model-Based+Design+for+Predictive+Maintenance+in+Autonomous+Vehicles+Using+AI
No ratings yet
Physics+Model-Based+Design+for+Predictive+Maintenance+in+Autonomous+Vehicles+Using+AI
15 pages
ML LAB Manual
No ratings yet
ML LAB Manual
28 pages
14 Ijsrcse 04156
No ratings yet
14 Ijsrcse 04156
12 pages
Sysid24 0033 MS
No ratings yet
Sysid24 0033 MS
7 pages
Lung CT Image Segmentation: Bachelor of Technology Electronics and Communication Engineering
No ratings yet
Lung CT Image Segmentation: Bachelor of Technology Electronics and Communication Engineering
19 pages
ML SP24 Mid Term Exam - Solution (1)
No ratings yet
ML SP24 Mid Term Exam - Solution (1)
8 pages
Process Validation Thesis
100% (3)
Process Validation Thesis
8 pages
2021-Digital Twinsof The Mooring Line Tension For Floating Offshore Wind Turbines To Improve Monitoring, Lifespan and Safety-Walker Et Al.
No ratings yet
2021-Digital Twinsof The Mooring Line Tension For Floating Offshore Wind Turbines To Improve Monitoring, Lifespan and Safety-Walker Et Al.
16 pages
OrthoNets: Orthogonal Channel Attention Networks
No ratings yet
OrthoNets: Orthogonal Channel Attention Networks
9 pages
Bba603b
No ratings yet
Bba603b
11 pages
Role of Machine Learning in Fake Review Detection
No ratings yet
Role of Machine Learning in Fake Review Detection
5 pages
3.2 Grid Search
No ratings yet
3.2 Grid Search
28 pages
Suicidal_Thought_Detection_Using_NLPNatural_Language_Processing_on_Reddit_Data (1)
No ratings yet
Suicidal_Thought_Detection_Using_NLPNatural_Language_Processing_on_Reddit_Data (1)
6 pages
Machine Learning On Big Data: Opportunities and Challenges: Version of Record
No ratings yet
Machine Learning On Big Data: Opportunities and Challenges: Version of Record
27 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
Artificial Intelligence Activities
No ratings yet
Artificial Intelligence Activities
34 pages
Question Bank DMC
No ratings yet
Question Bank DMC
28 pages
Bda Bi Jit Chapter-6
No ratings yet
Bda Bi Jit Chapter-6
16 pages
Full Notes
No ratings yet
Full Notes
62 pages
Final Year Paper
No ratings yet
Final Year Paper
5 pages