0% found this document useful (0 votes)

16 views11 pages

Linear Regression

Uploaded by

aimad baigouar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views11 pages

Linear Regression

Uploaded by

aimad baigouar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Linear Regression

Introduction 1
Simple regression 2
Making predictions 2
Cost function 2
Gradient descent 3
Training 4
Model evaluation 5
Summary 6
Multivariable regression 6
Growing complexity 6
Normalization 6
Making predictions 7
Initialize weights 8
Cost function 8
Gradient descent 8
Simplifying with matrices 9
Bias term 10
Model evaluation 10

Introduction
Linear Regression is a supervised machine learning algorithm where the predicted output is continuous
and has a constant slope. It's used to predict values within a continuous range, (e.g. sales, price) rather
than trying to classify them into categories (e.g. cat, dog). There are two main types:
Simple regression
Simple linear regression uses traditional slope-intercept form, where and are the variables our
algorithm will try to "learn" to produce the most accurate predictions. represents our input data and
represents our prediction.
y = mx + b
Multivariable regression
A more complex, multi-variable linear equation might look like this, where represents the coefficients, or
weights, our model will try to learn.
f(x, y, z) = w1x + w2y + w3z
The variables represent the attributes, or distinct pieces of information, we have about each
observation. For sales predictions, these attributes might include a company's advertising spend on radio,
TV, and newspapers.
Sales = w1Radio + w2TV + w3News
Simple regression
Let’s say we are given a dataset with the following columns (features): how much a company spends on
Radio advertising each year and its annual Sales in terms of units sold. We are trying to develop an
equation that will let us to predict units sold based on how much a company spends on radio advertising.
The rows (observations) represent companies.

Company Radio ($) Sales

Amazon 37.8 22.1
Google 39.3 10.4
Facebook 45.9 18.3
Apple 41.3 18.5

Making predictions
Our prediction function outputs an estimate of sales given a company's radio advertising spend and our
current values for Weight and Bias.
Sales = Weight ⋅ Radio + Bias
Weight
the coefficient for the Radio independent variable. In machine learning we call coefficients weights.
Radio
the independent variable. In machine learning we call these variables features.
Bias
the intercept where our line intercepts the y-axis. In machine learning we can call intercepts bias. Bias
offsets all predictions that we make.
Our algorithm will try to learn the correct values for Weight and Bias. By the end of our training, our
equation will approximate the line of best fit.

Code

def predict_sales(radio, weight, bias):

return weight*radio + bias

Cost function
The prediction function is nice, but for our purposes we don't really need it. What we need is a :doc:`cost
function <loss_functions>` so we can start optimizing our weights.
Let's use :ref:`mse` as our cost function. MSE measures the average squared difference between an
observation's actual and predicted values. The output is a single number representing the cost, or score,
associated with our current set of weights. Our goal is to minimize MSE to improve the accuracy of our
model.
Math
Given our simple linear equation , we can calculate MSE as:
n
MSE = 1N ∑ (yi − (mxi + b))2
i=1

Note

• is the total number of observations (data points)

• is the mean
• is the actual value of an observation and is our prediction

Code

def cost_function(radio, sales, weight, bias):

companies = len(radio)
total_error = 0.0
for i in range(companies):
total_error += (sales[i] - (weight*radio[i] + bias))**2
return total_error / companies

Gradient descent
To minimize MSE we use :doc:`gradient_descent` to calculate the gradient of our cost function. Gradient
descent consists of looking at the error that our weight currently gives us, using the derivative of the cost
function to find the gradient (The slope of the cost function using our current weight), and then changing
our weight to move in the direction opposite of the gradient. We need to move in the opposite direction of
the gradient since the gradient points up the slope instead of down it, so we move in the opposite direction
to try to decrease our error.
Math
There are two :ref:`parameters <glossary_parameters>` (coefficients) in our cost function we can control:
weight and bias . Since we need to consider the impact each one has on the final prediction, we use
partial derivatives. To find the partial derivatives, we use the :ref:`chain_rule`. We need the chain rule
because is really 2 nested functions: the inner function and the outer
function .
Returning to our cost function:
n
f(m, b) = 1N ∑ (yi − (mxi + b))2
i=1
Using the following:
(yi − (mxi + b))2 = A(B(m, b))
We can split the derivative into
A(x) = x 2
df
dx
= A 0(x) = 2x
and
B(m, b) = yi − (mxi + b) = yi − mxi − b
dx
dm
= B 0(m) = 0 − xi − 0 = − xi
dx
db
= B 0(b) = 0 − 0 − 1 = − 1
And then using the :ref:`chain_rule` which states:
df df dx
dm
= dx dm
df df dx
db
= dx db
We then plug in each of the parts to get the following derivatives
df
dm
= A 0(B(m, f))B 0(m) = 2(yi − (mxi + b)) ⋅ − xi
df
db
= A 0(B(m, f))B 0(b) = 2(yi − (mxi + b)) ⋅ −1
We can calculate the gradient of this cost function as:
\begin{align}■f'(m,b) =■ \begin{bmatrix}■ \frac{df}{dm}
Code
To solve for the gradient, we iterate through our data points using our new weight and bias values and
take the average of the partial derivatives. The resulting gradient tells us the slope of our cost function at
our current position (i.e. weight and bias) and the direction we should update to reduce our cost function
(we move in the direction opposite the gradient). The size of our update is controlled by the :ref:`learning
rate <glossary_learning_rate>`.

def update_weights(radio, sales, weight, bias, learning_rate):

weight_deriv = 0
bias_deriv = 0
companies = len(radio)

for i in range(companies):
# Calculate partial derivatives
# -2x(y - (mx + b))
weight_deriv += -2*radio[i] * (sales[i] - (weight*radio[i] + bias))

# -2(y - (mx + b))

bias_deriv += -2*(sales[i] - (weight*radio[i] + bias))

# We subtract because the derivatives point in direction of steepest ascent

weight -= (weight_deriv / companies) * learning_rate
bias -= (bias_deriv / companies) * learning_rate

return weight, bias

Training
Training a model is the process of iteratively improving your prediction equation by looping through the
dataset multiple times, each time updating the weight and bias values in the direction indicated by the
slope of the cost function (gradient). Training is complete when we reach an acceptable error threshold, or
when subsequent training iterations fail to reduce our cost.
Before training we need to initialize our weights (set default values), set our :ref:`hyperparameters
<glossary_hyperparameters>` (learning rate and number of iterations), and prepare to log our progress
over each iteration.
Code

def train(radio, sales, weight, bias, learning_rate, iters):

cost_history = []

for i in range(iters):
weight,bias = update_weights(radio, sales, weight, bias, learning_rate)

#Calculate cost for auditing purposes

cost = cost_function(radio, sales, weight, bias)
cost_history.append(cost)
# Log Progress
if i % 10 == 0:
print "iter={:d} weight={:.2f} bias={:.4f} cost={:.2}".format(i, weight, bias, cost)

return weight, bias, cost_history

Model evaluation
If our model is working, we should see our cost decrease after every iteration.
Logging

iter=1 weight=.03 bias=.0014 cost=197.25

iter=10 weight=.28 bias=.0116 cost=74.65
iter=20 weight=.39 bias=.0177 cost=49.48
iter=30 weight=.44 bias=.0219 cost=44.31
iter=30 weight=.46 bias=.0249 cost=43.28

Visualizing

Cost history
Summary
By learning the best values for weight (.46) and bias (.25), we now have an equation that predicts future
sales based on radio advertising investment.
Sales = . 46Radio + . 025
How would our model perform in the real world? I’ll let you think about it :)

Multivariable regression
Let’s say we are given data on TV, radio, and newspaper advertising spend for a list of companies, and
our goal is to predict sales in terms of units sold.

Company TV Radio News Units

Amazon 230.1 37.8 69.1 22.1
Google 44.5 39.3 23.1 10.4
Facebook 17.2 45.9 34.7 18.3
Apple 151.5 41.3 13.2 18.5

Growing complexity
As the number of features grows, the complexity of our model increases and it becomes increasingly
difficult to visualize, or even comprehend, our data.

One solution is to break the data apart and compare 1-2 features at a time. In this example we explore
how Radio and TV investment impacts Sales.

Normalization
As the number of features grows, calculating gradient takes longer to compute. We can speed this up by
"normalizing" our input data to ensure all values are within the same range. This is especially important for
datasets with high standard deviations or differences in the ranges of the attributes. Our goal now will be to
normalize our features so they are all in the range -1 to 1.
Code

For each feature column {

#1 Subtract the mean of the column (mean normalization)
#2 Divide by the range of the column (feature scaling)
}
Our input is a 200 x 3 matrix containing TV, Radio, and Newspaper data. Our output is a normalized matrix
of the same shape with all values between -1 and 1.

def normalize(features):
**
features - (200, 3)
features.T - (3, 200)

We transpose the input matrix, swapping

cols and rows to make vector math easier
**

for feature in features.T:

fmean = np.mean(feature)
frange = np.amax(feature) - np.amin(feature)

#Vector Subtraction
feature -= fmean

#Vector Division
feature /= frange

return features

Note
Matrix math. Before we continue, it's important to understand basic :doc:`linear_algebra` concepts
as well as numpy functions like numpy.dot().

Making predictions
Our predict function outputs an estimate of sales given our current weights (coefficients) and a company's
TV, radio, and newspaper spend. Our model will try to identify weight values that most reduce our cost
function.
Sales = W1TV + W2Radio + W3Newspaper

def predict(features, weights):

**
features - (200, 3)
weights - (3, 1)
predictions - (200,1)
**
predictions = np.dot(features, weights)
return predictions
Initialize weights
W1 = 0.0
W2 = 0.0
W3 = 0.0
weights = np.array([
[W1],
[W2],
[W3]
])

Cost function
Now we need a cost function to audit how our model is performing. The math is the same, except we swap
the expression for . We also divide the expression by 2 to make derivative
calculations simpler.
n
1 ∑
MSE = 2N (yi − (W1x1 + W2x2 + W3x3))2
i=1

def cost_function(features, targets, weights):

**
features:(200,3)
targets: (200,1)
weights:(3,1)
returns average squared error among predictions
**
N = len(targets)

predictions = predict(features, weights)

# Matrix math lets use do this without looping

sq_error = (predictions - targets)**2

# Return average squared error among predictions

return 1.0/(2*N) * sq_error.sum()

Gradient descent
Again using the :ref:`chain_rule` we can compute the gradient--a vector of partial derivatives describing
the slope of the cost function for each weight.
\begin{align}■f'(W_1) = -x_1(y - (W_1 x_1 + W_2 x_2 + W

def update_weights(features, targets, weights, lr):

'''
Features:(200, 3)
Targets: (200, 1)
Weights:(3, 1)
'''
predictions = predict(features, weights)

#Extract our features

x1 = features[:,0]
x2 = features[:,1]
x3 = features[:,2]
# Use matrix cross product (*) to simultaneously
# calculate the derivative for each weight
d_w1 = -x1*(targets - predictions)
d_w2 = -x2*(targets - predictions)
d_w3 = -x3*(targets - predictions)

# Multiply the mean derivative by the learning rate

# and subtract from our weights (remember gradient points in direction of steepest ASCENT)
weights[0][0] -= (lr * np.mean(d_w1))
weights[1][0] -= (lr * np.mean(d_w2))
weights[2][0] -= (lr * np.mean(d_w3))

return weights

And that's it! Multivariate linear regression.

Simplifying with matrices

The gradient descent code above has a lot of duplication. Can we improve it somehow? One way to
refactor would be to loop through our features and weights--allowing our function to handle any number of
features. However there is another even better technique: vectorized gradient descent.
Math
We use the same formula as above, but instead of operating on a single feature at a time, we use matrix
multiplication to operative on all features and weights simultaneously. We replace the terms with a
single feature matrix .
gradient = − X(targets − predictions)
Code

X = [
[x1, x2, x3]
[x1, x2, x3]
.
.
.
[x1, x2, x3]
]

targets = [
[1],
[2],
[3]
]

def update_weights_vectorized(X, targets, weights, lr):

**
gradient = X.T * (predictions - targets) / N
X: (200, 3)
Targets: (200, 1)
Weights: (3, 1)
**
companies = len(X)

#1 - Get Predictions
predictions = predict(X, weights)

#2 - Calculate error/loss
error = targets - predictions

#3 Transpose features from (200, 3) to (3, 200)

# So we can multiply w the (200,1) error matrix.
# Returns a (3,1) matrix holding 3 partial derivatives --
# one for each feature -- representing the aggregate
# slope of the cost function across all observations
gradient = np.dot(-X.T, error)

#4 Take the average error derivative for each feature

gradient /= companies

#5 - Multiply the gradient by our learning rate

gradient *= lr

#6 - Subtract from our weights to minimize cost

weights -= gradient

return weights

Bias term
Our train function is the same as for simple linear regression, however we're going to make one final tweak
before running: add a :ref:`bias term <glossary_bias_term>` to our feature matrix.
In our example, it's very unlikely that sales would be zero if companies stopped advertising. Possible
reasons for this might include past advertising, existing customer relationships, retail locations, and
salespeople. A bias term will help us capture this base case.
Code
Below we add a constant 1 to our features matrix. By setting this value to 1, it turns our bias term into a
constant.

bias = np.ones(shape=(len(features),1))
features = np.append(bias, features, axis=1)

Model evaluation
After training our model through 1000 iterations with a learning rate of .0005, we finally arrive at a set of
weights we can use to make predictions:
Sales = 4.7TV + 3.5Radio + . 81Newspaper + 13.9
Our MSE cost dropped from 110.86 to 6.25.

References
1 https://en.wikipedia.org/wiki/Linear_regression
2 http://www.holehouse.org/mlclass/04_Linear_Regression_with_multiple_variables.html
3 http://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning
4 http://people.duke.edu/~rnau/regintro.htm
5 https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression
6 https://www.analyticsvidhya.com/blog/2015/08/common-machine-learning-algorithms

Business English Communication PDF
100% (38)
Business English Communication PDF
121 pages
JD-1402-21 - E Information On Software Upgrade For BAM & LAN
No ratings yet
JD-1402-21 - E Information On Software Upgrade For BAM & LAN
15 pages
Learn WordPress From Scratch
No ratings yet
Learn WordPress From Scratch
32 pages
Machine Learning Cheat Sheet ??? - ?
No ratings yet
Machine Learning Cheat Sheet ??? - ?
231 pages
Machine Learning Cheat Sheet
100% (1)
Machine Learning Cheat Sheet
211 pages
Chapter 6 - Advanced Machine Learning PDF
No ratings yet
Chapter 6 - Advanced Machine Learning PDF
37 pages
Aspen Certification Exam Process
No ratings yet
Aspen Certification Exam Process
8 pages
Operator Manual Meb-2300
100% (2)
Operator Manual Meb-2300
287 pages
Experiment N1
No ratings yet
Experiment N1
7 pages
ML Cheatsheet
100% (1)
ML Cheatsheet
219 pages
ML Cheatsheet PDF
100% (1)
ML Cheatsheet PDF
211 pages
Unit-III Advanced Machine Learning
No ratings yet
Unit-III Advanced Machine Learning
8 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
9 pages
Advanced Machine Learning: Module-1
No ratings yet
Advanced Machine Learning: Module-1
164 pages
3.Linear Regression
No ratings yet
3.Linear Regression
18 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Module I Complete Notes
No ratings yet
Module I Complete Notes
136 pages
ML Primer PDF
No ratings yet
ML Primer PDF
122 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Take It Easy: Created Status Last Read
No ratings yet
Take It Easy: Created Status Last Read
55 pages
Module2-Optimizations
No ratings yet
Module2-Optimizations
65 pages
APSC 258 Midterm Study Guide
No ratings yet
APSC 258 Midterm Study Guide
4 pages
vertopal.com_22644501_lab02 (4)
No ratings yet
vertopal.com_22644501_lab02 (4)
14 pages
Gradient descent
No ratings yet
Gradient descent
16 pages
Module I-Part 1
No ratings yet
Module I-Part 1
48 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Lecture3_Linear Regression and Logistic Regression
No ratings yet
Lecture3_Linear Regression and Logistic Regression
60 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
ML Session 1
No ratings yet
ML Session 1
22 pages
Linear Regression
No ratings yet
Linear Regression
37 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Lecture - 4 - Logistic Regression
No ratings yet
Lecture - 4 - Logistic Regression
62 pages
CS601_Machine Learning_Unit 2 New
No ratings yet
CS601_Machine Learning_Unit 2 New
56 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
GR_1_report_week_7
No ratings yet
GR_1_report_week_7
6 pages
Unit 3.1 Gradient Descent in Linear Regression
No ratings yet
Unit 3.1 Gradient Descent in Linear Regression
6 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
12 pages
Module 3
No ratings yet
Module 3
27 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
7 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
UNIT 2
No ratings yet
UNIT 2
79 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Machine Learning Summary
No ratings yet
Machine Learning Summary
38 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37 (1)
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37 (1)
115 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Introduction to Advanced Mathematical Analysis
From Everand
Introduction to Advanced Mathematical Analysis
Simone Malacrida
No ratings yet
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
5/5 (1)
Unit 4 Pig and Hive
No ratings yet
Unit 4 Pig and Hive
86 pages
CST 438
No ratings yet
CST 438
4 pages
Books
No ratings yet
Books
123 pages
Krishnaik06 The-Grand-Complete-Data-Science-Materials
No ratings yet
Krishnaik06 The-Grand-Complete-Data-Science-Materials
45 pages
Artificial Intelligence Theory and Applications Proceedings of AITA 2023, Volume 1 (Harish Sharma, Antorweep Chakravorty Etc.) (Z-Library)
No ratings yet
Artificial Intelligence Theory and Applications Proceedings of AITA 2023, Volume 1 (Harish Sharma, Antorweep Chakravorty Etc.) (Z-Library)
495 pages
stv2012
No ratings yet
stv2012
6 pages
Bypass Windows XP Product Activation
No ratings yet
Bypass Windows XP Product Activation
24 pages
Cold Fear Manual - English
No ratings yet
Cold Fear Manual - English
10 pages
GKM561R QSG v1.1 (Removable Trackball)
No ratings yet
GKM561R QSG v1.1 (Removable Trackball)
11 pages
Class 12 Ip Practical Programs 2024-25
No ratings yet
Class 12 Ip Practical Programs 2024-25
37 pages
(Embedded Systems) Djones Lettnin, Markus Winterholer (Eds.) - Embedded Software Verification and Debugging-Springer (2017)
No ratings yet
(Embedded Systems) Djones Lettnin, Markus Winterholer (Eds.) - Embedded Software Verification and Debugging-Springer (2017)
220 pages
USB Device Not Recognized CP
0% (1)
USB Device Not Recognized CP
3 pages
Analysis and Design of Algorithms - Handout
No ratings yet
Analysis and Design of Algorithms - Handout
32 pages
Aaron Thompson Takeaways
No ratings yet
Aaron Thompson Takeaways
2 pages
Unit1-Programming in MATLAB_f3782ec7932e7978f78694_250305_205714
No ratings yet
Unit1-Programming in MATLAB_f3782ec7932e7978f78694_250305_205714
5 pages
Niagara - Easy - Dashboard - Datasheet-En V1
No ratings yet
Niagara - Easy - Dashboard - Datasheet-En V1
2 pages
Mapg 2024 1712162925
No ratings yet
Mapg 2024 1712162925
13 pages
Polish and Publish Your Game Teacher Guide
No ratings yet
Polish and Publish Your Game Teacher Guide
35 pages
Microsoft Training Services Partner Overview
No ratings yet
Microsoft Training Services Partner Overview
19 pages
Practical List: Charotar University of Science & Tchnology
No ratings yet
Practical List: Charotar University of Science & Tchnology
56 pages
Harman Kardon Avr171 Avr230 Avr1710 Avr1610 Avr161 SM
100% (1)
Harman Kardon Avr171 Avr230 Avr1710 Avr1610 Avr161 SM
174 pages
GC 2024 06 11
No ratings yet
GC 2024 06 11
8 pages
5 Components of Information Systems
No ratings yet
5 Components of Information Systems
2 pages
Bash Reference Manual - GNU Fondation
No ratings yet
Bash Reference Manual - GNU Fondation
286 pages
Water Quality Classification Using Machine Learning
No ratings yet
Water Quality Classification Using Machine Learning
6 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Linear Regression

Company Radio ($) Sales

def predict_sales(radio, weight, bias):

• is the total number of observations (data points)

def cost_function(radio, sales, weight, bias):

def update_weights(radio, sales, weight, bias, learning_rate):

# -2(y - (mx + b))

# We subtract because the derivatives point in direction of steepest ascent

return weight, bias

def train(radio, sales, weight, bias, learning_rate, iters):

#Calculate cost for auditing purposes

return weight, bias, cost_history

iter=1 weight=.03 bias=.0014 cost=197.25

Company TV Radio News Units

For each feature column {

We transpose the input matrix, swapping

for feature in features.T:

def predict(features, weights):

def cost_function(features, targets, weights):

predictions = predict(features, weights)

# Matrix math lets use do this without looping

# Return average squared error among predictions

def update_weights(features, targets, weights, lr):

#Extract our features

# Multiply the mean derivative by the learning rate

And that's it! Multivariate linear regression.

Simplifying with matrices

def update_weights_vectorized(X, targets, weights, lr):

#3 Transpose features from (200, 3) to (3, 200)

#4 Take the average error derivative for each feature

#5 - Multiply the gradient by our learning rate

#6 - Subtract from our weights to minimize cost

You might also like