100% found this document useful (1 vote)

107 views

Linear Regression

This document describes linear regression for predicting house prices based on living area size. It discusses: 1) Using a training dataset of house sizes and prices to learn a linear function that predicts price from size. 2) Representing the linear regression hypothesis as y = θ0 + θ1x, where θ0 and θ1 are parameters learned from the data. 3) Minimizing the cost function, the squared error between predictions and actual prices, through gradient descent to learn the best θ values.

Uploaded by

Kunal Langer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

107 views

Linear Regression

Uploaded by

Kunal Langer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

CS 60050

Machine Learning

Linear Regression

Some slides taken from course materials of Andrew Ng

Dataset of living area and price
of houses in a city

This is a training set.

How can we learn to predict the prices of houses of

other sizes in the city, as a function of their living area?
Dataset of living area and price
of houses in a city

Example of supervised learning problem.

When the target variable we are trying to predict is

continuous, regression problem.
Dataset of living area and price
of houses in a city

m = number of training examples

x's = input variables / features
y's = output variables / "target" variables
(x,y) - single training example
(xi, yj) - specific example (ith training example)
i is an index to training set
How to use the training set?

Learn a function h(x), so

that h(x) is a good
predictor for the
corresponding value of y

h: hypothesis function
How to represent hypothesis h?

θi are parameters
- θ0 is zero condition
- θ1 is gradient

θ: vector of all the parameters

We assume y is a linear function of x

Univariate linear regression
Digression:
Multivariate linear regression
How to represent hypothesis h?

θi are parameters
- θ0 is zero condition
- θ1 is gradient

We assume y is a linear function of x

Univariate linear regression
How to learn the values of the parameters θi?
Intuition of hypothesis function

•  We are attempting to fit a straight line to the

data in the training set
•  Values of the parameters decide the equation
of the straight line
•  Which is the best straight line to fit the data?
Intuition of hypothesis function

•  Which is the best straight line to fit the data?

•  How to learn the values of the parameters θi?

•  Choose the parameters such that the

prediction is close to the actual y-value for
the training examples
Cost function

•  Measure of how close the predictions are to

the actual y-values
•  Average over all the m training instances

•  Squared error cost function J(θ)

•  Choose parameters θ so that J(θ) is minimized
Hypothesis:

Parameters:

Cost Function:

Goal:
(for fixed , this is a function of x) (function of the parameters )

500

400

Price ($)
in 1000’s 300

200

100

0
0 1000 2000 3000
Size in feet2 (x)
Contour plot or Contour figure
Minimizing a function

•  For now, let us consider some arbitrary

function (not necessarily an error function)

•  Algorithm called gradient descent

•  Used in many applications of minimizing

functions
Have some function

Want

Outline:
•  Start with some
•  Keep changing to reduce
until we hopefully end up at a minimum
J(θ0,θ1)

θ1
θ0
If the function has multiple local minima, where one starts
can decide which minimum is reached

J(θ0,θ1)

θ1
θ0
Gradient descent algorithm

α is the learning rate – more on this later

Gradient descent algorithm

Correct: Simultaneous update Incorrect:

For simplicity, let us first consider a function of
a single variable
The learning rate
•  Gradient descent can converge to a local
minimum, even with the learning rate α fixed

•  But, value needs to be chosen judiciously

•  If α is too small, gradient descent can be slow
to converge
•  If α is too large, gradient descent can
overshoot the minimum. It may fail to
converge, or even diverge.
Gradient descent for
univariate linear regression
Gradient descent algorithm Linear Regression Model
Gradient descent for univariate linear regression

update
and
simultaneously
“Batch” Gradient Descent

“Batch”: Each step of gradient descent uses

all the training examples.

There are other variations like “stochastic

gradient descent” (used in learning over
huge datasets)
What about multiple local minima?
•  The cost function in linear regression is always a
convex function – always has a single global
minimum

•  So, gradient descent will always converge

Convex cost function
Gradient descent in action
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Linear Regression for
multiple variables
Multiple features (variables).

Size (feet2) Number of Number of Age of home Price ($1000)

bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Multiple features (variables).

Size (feet2) Number of Number of Age of home Price ($1000)

bedrooms floors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
Hypothesis:

Previously:

For multi-variate linear regression:

For convenience of notation, define .

Hypothesis:

Parameters:

Cost function:

Gradient descent:
Repeat

(simultaneously update for every )

New algorithm :
Gradient Descent
Repeat
Previously (n=1):

Repeat

simultaneously update for

(simultaneously update )
New algorithm :
Gradient Descent
Repeat
Previously (n=1):

Repeat

simultaneously update for

(simultaneously update )
Practical aspects of applying
gradient descent
Feature Scaling
Idea: Make sure features are on a similar scale.

E.g. = size (0-2000 feet2) size (feet2)

= number of bedrooms (1-5)
number of bedrooms
Feature Scaling
Idea: Make sure features are on a similar scale.

E.g. = size (0-2000 feet2)

= number of bedrooms (1-5)

Mean normalization:
Replace with to make features have approximately zero
mean (Do not apply to ).

Other types of normalization:

Is gradient descent working properly?
•  Plot how J(θ) changes with every iteration of
gradient descent

•  For sufficiently small learning rate, J(θ) should

decrease with every iteration

•  If not, learning rate needs to be reduced

•  However, too small learning rate means slow

convergence
When to end gradient descent?
•  Example convergence test:

•  Declare convergence if J(θ) decreases by less

than 0.001 in an iteration (assuming J(θ) is
decreasing in every iteration)
Polynomial Regression for
multiple variables
Choice of features

Price
(y)

Size (x)

(Ebook) Machine Learning Algorithms in Depth (MEAP V01) by Vadim Smolyakov ISBN 9781633439214, 1633439216 download pdf
100% (5)
(Ebook) Machine Learning Algorithms in Depth (MEAP V01) by Vadim Smolyakov ISBN 9781633439214, 1633439216 download pdf
81 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
CORRELATION & REGRESSION
No ratings yet
CORRELATION & REGRESSION
31 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
The Complete Guide To Data Preprocessing
No ratings yet
The Complete Guide To Data Preprocessing
50 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Lecture 03 Gradient Descent
No ratings yet
Lecture 03 Gradient Descent
26 pages
Matplotlib PDF
No ratings yet
Matplotlib PDF
16 pages
Python Plotly
No ratings yet
Python Plotly
8 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Poly
100% (1)
Poly
108 pages
Statistical Modeling
No ratings yet
Statistical Modeling
22 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Student Booklet For Sep 2015 v6
100% (1)
Student Booklet For Sep 2015 v6
50 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Cluster
100% (1)
Cluster
72 pages
CPE412 Pattern Recognition (Week 8)
100% (1)
CPE412 Pattern Recognition (Week 8)
25 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
EDA Assignment
No ratings yet
EDA Assignment
15 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Supervised Learning - Regression - Annotated
No ratings yet
Supervised Learning - Regression - Annotated
97 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Get Feature Engineering Bookcamp 1st Edition Sinan Ozdemir free all chapters
100% (2)
Get Feature Engineering Bookcamp 1st Edition Sinan Ozdemir free all chapters
55 pages
Estimation and Hypothesis
100% (1)
Estimation and Hypothesis
32 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Python Vs R in Data and Machine Learning PDF
100% (1)
Python Vs R in Data and Machine Learning PDF
6 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
Forecast
No ratings yet
Forecast
82 pages
Building A Recommendation System With R - Sample Chapter
No ratings yet
Building A Recommendation System With R - Sample Chapter
11 pages
Introduction To Python and Computer Programming 1704298503
No ratings yet
Introduction To Python and Computer Programming 1704298503
44 pages
ML Interview Questions and Answers
100% (1)
ML Interview Questions and Answers
25 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Ensemble Machine Learning With Python: 7-Day Mini-Course Jason Brownlee - The full ebook version is ready for instant download
100% (1)
Ensemble Machine Learning With Python: 7-Day Mini-Course Jason Brownlee - The full ebook version is ready for instant download
46 pages
Linear Regression: in Machine Learning
No ratings yet
Linear Regression: in Machine Learning
6 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Regression Project
100% (1)
Regression Project
60 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
From Everand
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
Zhenya Antić
No ratings yet
2 Cryptography
No ratings yet
2 Cryptography
71 pages
Statistics M6
No ratings yet
Statistics M6
18 pages
Midpoint Subdivision Line Clipping Algorithm
No ratings yet
Midpoint Subdivision Line Clipping Algorithm
4 pages
Binary Search Tree Notes
No ratings yet
Binary Search Tree Notes
7 pages
Financial Time Series Forecasting by Combining Anfis With Various Aggregation Operators
No ratings yet
Financial Time Series Forecasting by Combining Anfis With Various Aggregation Operators
48 pages
Ciphers
No ratings yet
Ciphers
3 pages
EGR3305-Lab-1-Fall 2023
No ratings yet
EGR3305-Lab-1-Fall 2023
16 pages
Ensemble Emperical Mode Decomposition (EEMD) : Diagnosis Kerusakan Roda Gigi Dengan Metode
No ratings yet
Ensemble Emperical Mode Decomposition (EEMD) : Diagnosis Kerusakan Roda Gigi Dengan Metode
7 pages
Stock Market Prediction Using Machine Learning (ML) Algorithms
No ratings yet
Stock Market Prediction Using Machine Learning (ML) Algorithms
20 pages
Combinational Logic Circuits PDF
No ratings yet
Combinational Logic Circuits PDF
20 pages
Logistic Regression 2024
No ratings yet
Logistic Regression 2024
23 pages
Differential Equation Course Outlines
No ratings yet
Differential Equation Course Outlines
5 pages
Minmax algorithm and Description
100% (1)
Minmax algorithm and Description
5 pages
T2a_Inno2019_MDOF_Systems_Energy Methods
No ratings yet
T2a_Inno2019_MDOF_Systems_Energy Methods
21 pages
Assignment 5 2020 PDF
No ratings yet
Assignment 5 2020 PDF
6 pages
20ad41e2 - Data Science
No ratings yet
20ad41e2 - Data Science
2 pages
The Queuing Theory: A Case Study Review
No ratings yet
The Queuing Theory: A Case Study Review
54 pages
Batch-4_idp
No ratings yet
Batch-4_idp
52 pages
DSP2 PDF
No ratings yet
DSP2 PDF
92 pages
Closeness Centrality Extended To Unconnected Graphs: The Harmonic Centrality Index
No ratings yet
Closeness Centrality Extended To Unconnected Graphs: The Harmonic Centrality Index
15 pages
Eulerian Graphs
No ratings yet
Eulerian Graphs
47 pages
2.4 Discrete Probability Distributions: N N EX I N
No ratings yet
2.4 Discrete Probability Distributions: N N EX I N
16 pages
Module 5:backtracking
No ratings yet
Module 5:backtracking
32 pages
Daa File
No ratings yet
Daa File
11 pages
4.question Bank
No ratings yet
4.question Bank
6 pages
Transactions of The Institute of Measurement and Control-2012-Rahimian-487-98
No ratings yet
Transactions of The Institute of Measurement and Control-2012-Rahimian-487-98
12 pages
Machine Learning in Biomedical Engineering
No ratings yet
Machine Learning in Biomedical Engineering
3 pages
Week9 Time Series Analysis
No ratings yet
Week9 Time Series Analysis
8 pages
(Ebook) Hilbert Huang Transform and Its Applications: 2nd Edition by Norden E Huang, Samuel S P Shen, Norden E Huang, Samuel S P Shen ISBN 9789814508230, 9814508233 - The ebook is now available, just one click to start reading
100% (1)
(Ebook) Hilbert Huang Transform and Its Applications: 2nd Edition by Norden E Huang, Samuel S P Shen, Norden E Huang, Samuel S P Shen ISBN 9789814508230, 9814508233 - The ebook is now available, just one click to start reading
56 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

CS 60050

Some slides taken from course materials of Andrew Ng

This is a training set.

How can we learn to predict the prices of houses of

Example of supervised learning problem.

When the target variable we are trying to predict is

m = number of training examples

Learn a function h(x), so

θ: vector of all the parameters

We assume y is a linear function of x

We assume y is a linear function of x

• We are attempting to fit a straight line to the

• Which is the best straight line to fit the data?

• Choose the parameters such that the

• Measure of how close the predictions are to

• Squared error cost function J(θ)

• For now, let us consider some arbitrary

• Algorithm called gradient descent

• Used in many applications of minimizing

α is the learning rate – more on this later

Correct: Simultaneous update Incorrect:

• But, value needs to be chosen judiciously

“Batch”: Each step of gradient descent uses

There are other variations like “stochastic

• So, gradient descent will always converge

Size (feet2) Number of Number of Age of home Price ($1000)

Size (feet2) Number of Number of Age of home Price ($1000)

For multi-variate linear regression:

For convenience of notation, define .

(simultaneously update for every )

simultaneously update for

simultaneously update for

E.g. = size (0-2000 feet2) size (feet2)

E.g. = size (0-2000 feet2)

Other types of normalization:

• For sufficiently small learning rate, J(θ) should

• If not, learning rate needs to be reduced

• However, too small learning rate means slow

• Declare convergence if J(θ) decreases by less

You might also like

•  We are attempting to fit a straight line to the

•  Which is the best straight line to fit the data?

•  Choose the parameters such that the

•  Measure of how close the predictions are to

•  Squared error cost function J(θ)

•  For now, let us consider some arbitrary

•  Algorithm called gradient descent

•  Used in many applications of minimizing

•  But, value needs to be chosen judiciously

•  So, gradient descent will always converge

•  For sufficiently small learning rate, J(θ) should

•  If not, learning rate needs to be reduced

•  However, too small learning rate means slow

•  Declare convergence if J(θ) decreases by less