vertopal.com_C1_W2_Lab04_FeatEng_PolyReg_Soln
vertopal.com_C1_W2_Lab04_FeatEng_PolyReg_Soln
Polynomial Regression
Goals
In this lab you will:
• explore feature engineering and polynomial regression which allows you to use the
machinery of linear regression to fit very complicated, even very non-linear functions.
Tools
You will utilize the function developed in previous labs as well as matplotlib and NumPy.
import numpy as np
import matplotlib.pyplot as plt
from lab_utils_multi import zscore_normalize_features,
run_gradient_descent_feng
np.set_printoptions(precision=2) # reduced display precision on numpy
arrays
What if your features/data are non-linear or are combinations of features? For example, Housing
prices do not tend to be linear with living area but penalize very small or very large houses
resulting in the curves shown in the graphic above. How can we use the machinery of linear
regression to fit this curve? Recall, the 'machinery' we have is the ability to modify the
parameters w , b in (1) to 'fit' the equation to the training data. However, no amount of adjusting
of w ,b in (1) will achieve a fit to a non-linear curve.
Polynomial Features
Above we were considering a scenario where the data was non-linear. Let's try using what we
know so far to fit a non-linear curve. We'll start with a simple quadratic: y=1+ x 2
You're familiar with all the routines we're using. They are available in the lab_utils.py file for
review. We'll use np.c_[..] which is a NumPy routine to concatenate along the column
boundary.
2
Well, as expected, not a great fit. What is needed is something like y=w0 x 0+ b, or a polynomial
feature. To accomplish this, you can modify the input data to engineer the needed features. If
you swap the original data with a version that squares the x value, then you can achieve
2
y=w0 x 0+ b. Let's try it. Swap X for X**2 below:
# Engineer features
X = x**2 #<-- added engineered feature
Great! near perfect fit. Notice the values of w and b printed right above the graph: w,b found
by gradient descent: w: [1.], b: 0.0490. Gradient descent modified our initial
2
values of $\mathbf{w},b $ to be (1.0,0.049) or a model of y=1∗x 0 +0.049, very close to our
2
target of y=1∗x 0 +1. If you ran it longer, it could be a better match.
Selecting Features
Above, we knew that an x 2 term was required. It may not always be obvious which features are
required. One could add a variety of potential features to try and find the most useful. For
2 3
example, what if we had instead tried : y=w0 x 0+ w1 x 1+ w2 x2 + b ?
# engineer features .
X = np.c_[x, x**2, x**3] #<-- added engineered feature
Note the value of w , [0.08 0.54 0.03] and b is 0.0106.This implies the model after
fitting/training is:
2 3
0.08 x +0.54 x +0.03 x + 0.0106
Gradient descent has emphasized the data that is the best fit to the x 2 data by increasing the w 1
term relative to the others. If you were to run for a very long time, it would continue to reduce
the impact of the other terms.
Gradient descent is picking the 'correct' features for us by emphasizing its associated
parameter
• Intially, the features were re-scaled so they are comparable to each other
• less weight value implies less important/correct feature, and in extreme, when the
weight becomes zero or very close to zero, the associated feature is not useful in fitting
the model to the data.
• above, after fitting, the weight associated with the x 2 feature is much larger than the
weights for x or x 3 as it is the most useful in fitting the data.
An Alternate View
Above, polynomial features were chosen based on how well they matched the target data.
Another way to think about this is to note that we are still using linear regression once we have
created new features. Given that, the best features will be linear relative to the target. This is
best understood with an example.
# create target data
x = np.arange(0, 20, 1)
y = x**2
# engineer features .
X = np.c_[x, x**2, x**3] #<-- added engineered feature
X_features = ['x','x^2','x^3']
Above, it is clear that the x 2 feature mapped against the target value y is linear. Linear
regression can then easily generate a model using that feature.
Scaling features
As described in the last lab, if the data set has features with significantly different scales, one
should apply feature scaling to speed gradient descent. In the example above, there is x , x 2 and
3
x which will naturally have very different scales. Let's apply Z-score normalization to our
example.
# add mean_normalization
X = zscore_normalize_features(X)
print(f"Peak to Peak range by column in Normalized X:
{np.ptp(X,axis=0)}")
x = np.arange(0,20,1)
y = x**2
Complex Functions
With feature engineering, even quite complex functions can be modeled:
x = np.arange(0,20,1)
y = np.cos(x/2)
Congratulations!
In this lab you:
• learned how linear regression can model complex, even highly non-linear functions using
feature engineering
• recognized that it is important to apply feature scaling when doing feature engineering