Scikit Learn - Extended Linear Modeling



This chapter focusses on the polynomial features and pipelining tools in Sklearn.

Introduction to Polynomial Features

Linear models trained on non-linear functions of data generally maintains the fast performance of linear methods. It also allows them to fit a much wider range of data. Thats the reason in machine learning such linear models, that are trained on nonlinear functions, are used.

One such example is that a simple linear regression can be extended by constructing polynomial features from the coefficients.

Mathematically, suppose we have standard linear regression model then for 2-D data it would look like this −

$$Y=W_{0}+W_{1}X_{1}+W_{2}X_{2}$$

Now, we can combine the features in second-order polynomials and our model will look like as follows −

$$Y=W_{0}+W_{1}X_{1}+W_{2}X_{2}+W_{3}X_{1}X_{2}+W_{4}X_1^2+W_{5}X_2^2$$

The above is still a linear model. Here, we saw that the resulting polynomial regression is in the same class of linear models and can be solved similarly.

To do so, scikit-learn provides a module named PolynomialFeatures. This module transforms an input data matrix into a new data matrix of given degree.

Parameters

Followings table consist the parameters used by PolynomialFeatures module

Sr.No Parameter & Description
1

degree − integer, default = 2

It represents the degree of the polynomial features.

2

interaction_only − Boolean, default = false

By default, it is false but if set as true, the features that are products of most degree distinct input features, are produced. Such features are called interaction features.

3

include_bias − Boolean, default = true

It includes a bias column i.e. the feature in which all polynomials powers are zero.

4

order − str in {C, F}, default = C

This parameter represents the order of output array in the dense case. F order means faster to compute but on the other hand, it may slow down subsequent estimators.

Attributes

Followings table consist the attributes used by PolynomialFeatures module

Sr.No Attributes & Description
1

powers_ − array, shape (n_output_features, n_input_features)

It shows powers_ [i,j] is the exponent of the jth input in the ith output.

2

n_input_features _ − int

As name suggests, it gives the total number of input features.

3

n_output_features _ − int

As name suggests, it gives the total number of polynomial output features.

Implementation Example

Following Python script uses PolynomialFeatures transformer to transform array of 8 into shape (4,2) −

from sklearn.preprocessing import PolynomialFeatures
import numpy as np
Y = np.arange(8).reshape(4, 2)
poly = PolynomialFeatures(degree=2)
poly.fit_transform(Y)

Output

array(
   [
      [ 1., 0., 1., 0., 0., 1.],
      [ 1., 2., 3., 4., 6., 9.],
      [ 1., 4., 5., 16., 20., 25.],
      [ 1., 6., 7., 36., 42., 49.]
   ]
)

Streamlining using Pipeline tools

The above sort of preprocessing i.e. transforming an input data matrix into a new data matrix of a given degree, can be streamlined with the Pipeline tools, which are basically used to chain multiple estimators into one.

Example

The below python scripts using Scikit-learns Pipeline tools to streamline the preprocessing (will fit to an order-3 polynomial data).

#First, import the necessary packages.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np

#Next, create an object of Pipeline tool
Stream_model = Pipeline([('poly', PolynomialFeatures(degree=3)), ('linear', LinearRegression(fit_intercept=False))])

#Provide the size of array and order of polynomial data to fit the model.
x = np.arange(5)
y = 3 - 2 * x + x ** 2 - x ** 3
Stream_model = model.fit(x[:, np.newaxis], y)

#Calculate the input polynomial coefficients.
Stream_model.named_steps['linear'].coef_

Output

array([ 3., -2., 1., -1.])

The above output shows that the linear model trained on polynomial features is able to recover the exact input polynomial coefficients.

Advertisements