0% found this document useful (0 votes)

11 views

Predictive Modelling Monograph Final

Uploaded by

Ashish Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Predictive Modelling Monograph Final

Uploaded by

Ashish Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

A Short Monograph on

Regression Model Selection

TO SERVE AS A REFRESHER FOR PGP-DSBA

[email protected]
21YORICED7

1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Index

Contents
1. Introduction .............................................................................................................................................................. 4
1.1 Introduction to Predictive Modelling ................................................................................................. 4
1.2 Supervised and Unsupervised Learning .................................................................................................. 5
2. The Problem of Prediction: Bias Variance Trade-off................................................................................ 7
2.1 Bias of a model.................................................................................................................................................... 7
2.2 Variance of a model .......................................................................................................................................... 8
2.3 Bias Variance Trade-off .................................................................................................................................. 9
2.4 Training and Test data sets ....................................................................................................................... 10
2.5 Cross-Validation..................................................................................................................................... 10
3. Model Selection..................................................................................................................................................... 12
3.1 Transformation of the response .............................................................................................................. 12
3.2 Information criteria: AIC and BIC ........................................................................................................... 18
3.3 Forward Selection Algorithm ................................................................................................................... 19
3.4 Backward Elimination ................................................................................................................................. 22
[email protected]
21YORICED7
3.5 Stepwise Regression ...................................................................................................................................... 25
3.6 All Possible Regression or Regression Subset Selection and Mallows’ Cp Criterion ......... 28
3.7 Choosing the Final Model............................................................................................................................ 31

2
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
List of Figures
Fig. 1: Statistical Learning Flowchart ....................................................................................................6
Fig. 2: Overfitting vs Underfitting vs Ideal Model ................................................................................ 8

Fig. 3: Log likelihood vs Lambda Parameter .......................................................................................14

Fig. 4: Residual plot for 1/alcohol vs. other predictor variables .......................................................... 15
Fig. 5: Residual plot for 1/alcohol2 vs. other predictor variables ......................................................... 16
Fig. 6: Flowchart for Regression Model Selection .............................................................................. 33

List of Tables
Table 1: Description of the data (Data Dictionary):............................................................................. 13

[email protected]
21YORICED7

3
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
1. Introduction
1.1 Introduction to Predictive Modelling

Technology has given us the power to capture data from almost everything − road accidents,
grocery store bar codes, customer loyalty programs as well as political opinions expressed on
Twitter and Facebook. However, amassing data does not help us in any way unless we can
understand the information contained in the data. Predictive modeling and data mining help to
detect the hidden patterns in the data and to forecast for yet unobserved situations.
In this monograph and few subsequent monographs, various topics in predictive modeling and
data mining will be taken up. But before we go into the details of predictive modeling, several
major concepts need to be addressed.
What is the objective of predictive modeling?
Let us examine two different cases. Identification of spam or detection of fraudulent credit card
transactions is two situations where the application of predictive modeling is important. In the
former case, the objective is to detect spam email through a filter and classify it as such. In the
latter case, the banks would want to identify frauds immediately, and red flag the transactions.
In both cases, prediction accuracy needs to be very high, though how the spam was filtered or
fraud was detected may not be so important. In this case, the model may be complex and the
[email protected]
21YORICED7interpretability of the model low. Often these models are known as ‘black box’ models.
Automation will work well here.
Consider another case where a physician needs to predict whether a post-menopausal woman
above 65 years of age is at a risk of knee replacement surgery given various other health
indicators. Here a ‘black box’ model may not be acceptable despite having high accuracy. The
reason being, it is not enough to know who is at risk but it is mandatory to mitigate her risk of
knee replacement. Unless a physician can understand which health indicators are more
important, she will not be able to provide an effective treatment regime for her patient.
Interpretability is a must in this situation. A black box model may be rejected in favor of an
interpretable model, albeit accuracy in the former may be higher.
That is not to say predictive accuracy needs to be sacrificed totally to improve interpretability.
Somewhere a balance needs to be struck. The suitability of the predictive model is an important
issue that may need to be addressed case by case.

Predictive Analytics or Predictive Modelling is a process of extracting information from large

complex data sets using a variety of computational tools to make predictions and estimates
about future outcomes.

4
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
1.2 Supervised and Unsupervised Learning

All problems of data mining, pattern recognition, or predictive modeling come under the
umbrella known as Statistical Learning.
Statistical learning problems can be partitioned into Supervised or Unsupervised Learning.
The problems discussed in the previous section belong to the first category. The objective in
each case is to predict a response, typically denoted by Y. Corresponding to each unit of
observation there is a set of independent variables or predictors (X), based on which Y is
estimated (see the monograph on Regression). Whenever in the data set the response is
available, the problem falls under supervised learning.
Supervised learning can be further divided into two classes, depending on the nature of the
response Y. If Y is a continuous variable, the problem falls under the category Regression. On
the other hand, if the response is qualitative, binary, or multi-class, the problem falls under the
category Classification. Spam identification (Yes/No) and detection of fraud (Yes/No) are both
classification problems.
The problem of assigning a risk value to a patient may be considered a classification problem
if the risk is defined as low, medium, or high. If, however, a continuous risk probability is to be
estimated for each patient, the problem is considered a regression problem.
Unsupervised learning problems are those where there is no response. One example of an
unsupervised learning problem is to categorize loyalty customers in a Gold, Silver, and Bronze
[email protected]
21YORICED7classification, depending on their propensity of spending in a store. Detection of possible
clusters in a multivariate data set is an unsupervised learning problem.
Another example of unsupervised learning is the extraction of factors from a complex data set
(see the monograph on Dimension Reduction).
An illustrative figure is shown on the next page with a few techniques of different types of learning.

5
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
A few examples of classification and regression techniques. By no means the examples
provided are comprehensive.
(Special note: Neural network may also be used where the response is continuous and logistic
regression may also include lasso regularization. But for initial understanding these examples are
illustrative.)

Regression

Linear regression
Regression Tree

Learning
[email protected]
Unsupervised
21YORICED7 Classification
Hierarchical clustering
K-means clustering
Analysis
PCA-FA

machine

6
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
2. The Problem of Prediction: Bias
Variance Trade-off
However accurate the prediction models are, there will always be a prediction error, the
difference between the predicted and actual observations. Recall that for regression this error is
also known as residual. Prediction error has two components: bias and variance. To minimize
prediction error, both bias and variance need to be at a low level. Unfortunately, however, both
bias and variance cannot be minimized simultaneously.

Identification of a suitable prediction model involves a trade-off between its bias and variance.

Gaining a proper understanding of bias and variance would help in avoiding the mistakes of
overfitting or under-fitting. The problem of bias and variance are intimately associated with the
problem of under-fitting and overfitting. A good model is one for which the bias and variance
are as small as possible, and for which, the predictive ability is good.

The notions of bias and variance of a model are explained below.

Let us consider a model 𝑌=𝑓(𝑋)+𝜖, where 𝑓 is an arbitrary function of the independent variables
𝑋 and 𝜖 is the random error component. (Most models in predictive problems are of this type).
The function 𝑓 may be completely unknown or one may only have partial knowledge about its
[email protected]
21YORICED7
form. For example, suppose 𝑓 to be linear, i.e. 𝑓(𝑥)=𝑎+𝑏𝑥. Unless the numerical values of 𝑎 or
𝑏 are known, only the type of the function 𝑓 is known but not the complete function. Let an
estimate 𝑓̂ of 𝑓 is obtained. For any given value of 𝑋, the predicted value 𝑌̂=𝑓̂ (𝑋). Usually, 𝑓̂
is computed using the data 𝑦, and hence 𝑓̂ (𝑥) is a random variable.

All the subsequent definitions and explanations in the next three sections will be based on this
model.

2.1 Bias of a model

Bias is the average difference between the predicted values of a model and the observed values.
𝐵𝑖𝑎𝑠=𝐸(𝑓̂ (𝑥)−𝑓(𝑥))
where E denotes expectation or mean. A model is called unbiased if 𝐵𝑖𝑎𝑠=0. Bias can be both
negative and positive, so a desirable condition for bias is that the absolute value of Bias is close
to 0.

Recall here that a multiple linear regression model is unbiased since the average value of the
least-squares estimate of residuals is 0.

7
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
A model with high bias pays very little attention to the dataset and oversimplifies the model.
This phenomenon is called under-fitting. Under-fitting may happen for various reasons. The
most common reasons include having only a limited amount of data to model a complex structure
or fitting a linear model to non-linear data. An oversimplified model exhibits small variability.

2.2 Variance of a model

Variance is defined to be the variability of predicted values which quantifies the instability of
the model.
Formally,
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=𝐸(𝑓̂ (𝑥)−𝐸(𝑓̂ (𝑥)))2
This is the error variance and is often denoted by the mean square error (MSE). Unlike bias,
the variance can never be negative.

A model with high variance pays a lot of attention to the available data. Such models perform
very well on data used to fit the model but have limited power to predict for the data that has not
been observed. Hence it has high error rates on data used for prediction. This phenomenon is
called overfitting. Overfitting happens when a model is too close to the observed data and
captures the noise along with the underlying pattern in data.

Let us demonstrate the concepts of bias and variance through a visualization.

[email protected]
Observe n paired data points (𝑥1,𝑦1),(𝑥2,𝑦2),…,(𝑥𝑛,𝑦𝑛). The model 𝑦 = 𝑓(𝑥) + 𝜖 is fitted to the
21YORICED7
data and the estimate 𝑓̂ is obtained. If a new observation 𝑥 is one of 𝑥1,𝑥2,…,𝑥𝑛 then 𝑓̂ can be
chosen so that 𝐸(𝑓̂ (𝑥)) = 𝑓(𝑥) exactly and hence the bias is zero when evaluated at the observed
𝑥 value. But if 𝑥 is different from 𝑥1, 𝑥2,…, or 𝑥𝑛, then our estimate 𝑓̂ may behave poorly. The
reason is that 𝑓̂ becomes very short-sighted since it tries to fit the observed data (𝑥𝑖,𝑦𝑖) as
perfectly as possible, and does not take into consideration any new data point which could be
used in the future.

The following picture illustrates these concepts.

8
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
The leftmost panel shows an almost perfect fir to the observed dataset and thus bias is very
small. But for this model the variability is very high. As soon as one new observation is added
to the data, the function f(x) may change considerably.

The middle panel shows a model that completely ignores the curvature in the dataset and plots
a straight line through it. Clearly it does not fit the data at all. This model varies only slightly for
a different sample and impact of addition of a number of points may be negligible. Though in
theory variance of a model cannot be zero, for all practical purposes it may be considered such.
But the bias for this model is very high.

The rightmost panel shows a much better fit. It does not have zero bias, nor does it enjoy
negligible variance, but it strikes a balance somewhere between the two and also fits the data
well.

The objective of predictive modelling is to find such an optimum f(x).

2.3 Bias Variance Trade-off

[email protected]
21YORICED7
If a model is too simple and has only a few parameters, then it may have high bias and low
variance. On the other hand if a model has a large number of parameters then it is likely to have
high variance and low bias. So it is essential to find the right balance without overfitting or
under-fitting the data. This trade-off between bias and variance is closely associated with a
trade-off in complexity of the model.

The total squared error of a model can be expressed as

𝑇𝑜𝑡𝑎𝑙 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑒𝑟𝑟𝑜𝑟 = 𝐵𝑖𝑎𝑠2 + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 + 𝐼𝑟𝑟𝑒𝑑𝑢𝑐𝑖𝑏𝑙𝑒 𝑒𝑟𝑟𝑜𝑟
It is not possible for any statistical model to manipulate irreducible error, which is inherent in the
data.

The goal of predictive modelling is to reduce 𝐵𝑖𝑎𝑠2 + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒. As described above, due to
the bias-variance trade-off, no model can reduce both bias and variance simultaneously, and
therefore, a good model will try to minimize the sum of 𝐵𝑖𝑎𝑠2 + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒.

9
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
In general, an unbiased model with smallest possible variance is the preferred. However,
in special situations, a biased model is deliberately chosen for which 𝐵𝑖𝑎𝑠2 + 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 is
smaller than any unbiased model.

Such a model has the best predictive power, i.e. it is able to provide the best possible estimated
value for a new observation.

2.4 Training and Test data sets

The discussion above establishes an important fact. Unless predictive ability of a model is tested
on an independent data set, which is different than the one used to build the model, a vital aspect
of the model is ignored. This necessitates splitting of the existing data set into two or more parts.

Training data: A training dataset is used to fit one or more models and estimate their parameters.

Test data: A test dataset is used to assess the performance of the developed model. The test set
should be as close as possible to the training dataset (more formally, having the same
distribution) but no overlap with the training dataset.

[email protected]
21YORICED7Typically training and test data are partitions of the observed data into a random 80:20 split.
Other possible splits may be 75:25 or 70:30 or in some other ratio, all taken randomly. If the
data is very large even 50:50 split into training and test is also possible. Training data is larger
so that the model parameters are estimated with considerable accuracy. The purpose of having
a test data is to check how close the predicted values are to the observed values.

Choice of test data may also be modified according to special applications. In certain predictive
methods that are intended to be applied for a period of time (e.g. credit scoring), the test data is
taken to be the most recent period. For example: Q1, Q2, and Q3 data are used in training set,
but Q4 data is used as test set for a model that is intended to project credit risk for the next one
quarter. In time series, one of the most specialized predictive models, only the most recent
periods are used as test data. (This is discussed in detail in Time Series Monograph).

In the test data set no parameter estimation is performed.

2.5 Cross Validation

One criticism of training and test data split is that, the proposed model may depend on the split, since
the split is done once only. One suggested alternative is to perform a k-fold cross-validation, which is
an extension of the train-test split.

10
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
k-fold cross validation is a method of getting multiple sets of training and test data out of the
original data set. The steps are as follows.
i) Split the whole dataset randomly into 𝑘 equal (or almost equal) parts.
ii) Choose one of these 𝑘 parts as the test dataset and the other 𝑘−1 parts together as the
training dataset.
iii) Fit the model on the training dataset thus obtained and assess its performance on the
test dataset. Usually, one predicts the values in the test dataset using this model and takes
the square root of the error sum of squares (RMSE), but other measures of prediction
errors are also possible
iv) Repeat this process 𝑘 times, once for each of the 𝑘 splits as the test data, and the
complementary set as the training data
v) Finally, take average of all the 𝑘 RMSEs to get final estimate of prediction error.
The most important advantage of k-fold cross-validation is that, each data point is included at
least once in the training and in the test data. Compared to the training-test split method, cross-
validation is more computation-intensive.
The value of k is usually taken to be 10. But due to computational complexity k = 5 is also a
common choice. Note that the higher the value of k is, the smaller is the size of test data. That
should be another consideration is choosing an optimum value of k.

[email protected]
21YORICED7

11
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
3. Model Selection
In this section several approaches to regression model selection is discussed.

3.1 Transformation of the response

In the monograph on Linear Regression, after fitting the linear regression models, a residual
analysis was performed to check if the regression assumptions (the LINE assumptions, see Sec
4.2.1) remained valid. Recall that one of the assumptions was normality. Another important
assumption was that the variances of the response were constant.
If any or both of these two assumptions are violated, which can be detected from the residual
plot, a transformation of the response may be necessary. Many transformations of the response
are possible, but the most popular choice of transformations are given by the Box-Cox family
of transformations.
Let 𝜆 be a constant, 𝜆≠0. The response Y is transformed to a new variable Z (say) where Z =
(𝑦𝜆−1) / 𝜆. If 𝜆 = 0 then the response is changed to log(𝑦). The transformed response Z fulfils
the regression assumptions. However, it is clear from the functional form of the transformation,
that Z can be simplified as 𝑌𝜆. This simplification works because 1 / 𝜆 being a constant, gets
absorbed into the intercept term, and the regression slope parameters are multiplied (scaled) by
the constant 𝜆. If any predictor was significant in predicting (𝑦𝜆−1) / 𝜆 then it is expected to be
[email protected]
21YORICED7
significant in predicting 𝑦^𝜆. Therefore, the inference remains unchanged whether we use 𝑦𝜆
or (𝑦𝜆−1) / 𝜆.
That value of λ is chosen such that the log-likelihood curve reaches its maximum. This will be
explained in further detail through the case study.

Case Study
[Refer to the monograph on Multiple Linear Regression]
A top wine manufacturer wants to invest in new technologies to improve its wine quality. Wine
quality is directly dependent on the amount of alcohol in wines and the smoothness which, in
turn, is controlled by various chemicals either directly added during the manufacturing process
or generated through various chemical reactions. Wine certification and quality assessment are
key elements for wine gradation and its price ticket. Wine certification is determined by various
physiochemical elements in the wine. Therefore, the company wants to estimate the percentage
(%) of alcohol in a bottle of wine as a function of various chemical components of the wine.

12
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Statement of the Problem: Develop the best possible model to predict alcohol content in a red
wine values of whose chemical components are known.
Description of the data (Data Dictionary):
Variables Description
fixed acidity (FA) Number of grams of Tartaric acid per cubic decimetre, (𝑔(𝑡𝑎𝑟𝑡𝑎𝑟𝑖𝑐 𝑎𝑐𝑖𝑑)/
𝑑𝑚3 )
volatile acidity (VA) Number of grams of Acetic acid per cubic decimetre,
(𝑔(𝑎𝑐𝑒𝑡𝑖𝑐 𝑎𝑐𝑖𝑑)/ 𝑑𝑚3 )
citric acid (CA) Number of grams of Citric acid per cubic decimetre, (g/𝑑𝑚3 )
residual sugar (RS) Number of grams of Residual sugar per cubic decimetre, (𝑔/ 𝑑𝑚3 )
Chlorides Number of grams of Sodium chloride per cubic decimetre,
(𝑔(𝑠𝑜𝑑𝑖𝑢𝑚 𝑐ℎ𝑙𝑜𝑟𝑖𝑑𝑒)/𝑑𝑚3)
free sulphur dioxide Number of milligram of Free sulphur dioxide per cubic decimetre,
(FSD) (𝑚𝑔/𝑑𝑚3 )
total sulphur dioxide Number of milligram of total sulphur dioxide per cubic decimetre,
(TSD) (𝑚𝑔/𝑑𝑚3 )
Density Number of grams per cubic centimetre (𝑔/𝑐𝑚3 )
Ph pH is a scale of acidity from 0 to 14.
Sulphates Number of grams of Potassium sulphate per cubic decimetre,
(𝑔(𝑝𝑜𝑡𝑎𝑠𝑠𝑖𝑢𝑚 𝑠𝑢𝑙𝑝ℎ𝑎𝑡𝑒) /𝑑𝑚3 )
Brand (categorical) three different brands of wine are considered where 1 represents “Grover
Zampa”, 2 represents “Seagram” and 3 represents “Sula Vineyards”.
Alcohol (response) percentage volume of alcohol in wine (% vol. )

We have considered the problem of building a multiple linear regression model in the
[email protected]
21YORICED7monograph on Linear Regression. In this monograph, the problem of selecting the best possible
predictive model using this dataset is considered.
The dataset is split into a training and a test data set according to a random 80:20 allocation.
Training data contains 1279 observations since 80% of 1599 is approximately 1279. A simple
random sample (without replacement) of size 1279 is taken from the first 1599 positive integers,
and the training dataset is formed by selecting the rows of WineData corresponding to these
random numbers. The remaining (1599-1279=) 320 rows will constitute the test dataset.
import numpy as np
import pandas as pd
import seaborn as sns
from scipy import stats
from sklearn.model_selection import train_test_split
from statsmodels.formula.api import ols
import statsmodels.regression.linear_model as sm
import matplotlib.pyplot as plt
df = pd.read_csv('WineData.csv')
X = df.drop(['alcohol','ID'],axis = 1)
y = df['alcohol']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
train_WineData = pd.concat([X_train,y_train],axis=1)
test_WineData = pd.concat([X_test,y_test],axis=1
train_WineData =
pd.concat([train_WineData,pd.get_dummies(train_WineData['Brand'],drop_first=True)],axis=1)
test_WineData =
pd.concat([test_WineData,pd.get_dummies(test_WineData['Brand'],drop_first=True)],axis=1)
train_WineData['SulaVineyards'] = train_WineData['Sula Vineyards']
test_WineData['SulaVineyards'] = test_WineData['Sula Vineyards']
train_WineData.drop(['Brand','Sula Vineyards'],axis = 1,inplace = True)
test_WineData.drop(['Brand','Sula Vineyards'],axis = 1,inplace = True)

13
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Thus the training dataset train_WineData in created on which the candidate models will be
built. The test dataset is test_WineData on which the models will be validated by comparing
their predictive ability.

Let us first determine the value 𝜆 to see whether any transformation is necessary. Note that
henceforth all model building activities will be carried on the training data set.

MLR_wine =
ols(formula="alcohol~FA+VA+CA+RS+chloride+FSD+TSD+density+sulphate+pH+Seagram+SulaVi
neyards",data=train_WineData).fit()

lmbdas = np.linspace(-3, 2)
llf = np.zeros(lmbdas.shape, dtype=float)
for ii, lmbda in enumerate(lmbdas):
llf[ii] = stats.boxcox_llf(lmbda, MLR_wine.fittedvalues)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(lmbdas, llf, 'b.-')
ax.axhline(stats.boxcox_llf(lmbda_optimal, MLR_wine.fittedvalues), color='r')
ax.set_xlabel('lmbda parameter')
ax.set_ylabel('log-likelihood')

[email protected]
21YORICED7

That value(s) of 𝜆 is (are) chosen for which the graph plotted above reaches its maximum. For
this data the maximum is attained somewhere between −1 and −2. For clarity of interpretation,
any fractional value of 𝜆 is ignored.

14
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Taking 𝜆 = −1, the response is transformed to get new_resp1 and is modelled by MLR on the
predictors from the training dataset.
#lambda=-1
new_resp1=1/train_WineData['alcohol']
train_WineData['new_resp'] = new_resp1

MLR_wine1 =
ols(formula="new_resp~FA+VA+CA+RS+chloride+FSD+TSD+density+sulphate+pH+Seagram+SulaVineyar
ds",data=train_WineData).fit()

regression_plots(MLR_wine1,train_WineData) # code for this method is made available in the

Regression Monograph

[email protected]
21YORICED7

It is clear that the regression assumptions are not violated. Next, 𝜆 = −2 transformation is
considered.

#lambda=-2

15
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
new_resp2=1/train_WineData['alcohol']**2
train_WineData['new_resp'] = new_resp2

MLR_wine2 =
ols(formula="new_resp~FA+VA+CA+RS+chloride+FSD+TSD+density+sulphate+pH+Seagram+SulaVineyar
ds",data=train_WineData).fit()

regression_plots(MLR_wine2,train_WineData)

[email protected]
21YORICED7

16
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
The regression assumptions are not violated in this case also. So both the transformations with
𝜆=−1 and 𝜆=−2 may be considered. Whereas 𝜆=−1 involves taking the reciprocal of the response
(alcohol), 𝜆=−2 implies squaring the response (alcohol) and taking its reciprocal. This extra step,
in absence of any definite improvement, lacks interpretability. Therefore, 𝜆=−1 transformation is
chosen.
In all the subsequent discussions, the transformation of the response with 𝜆 = −1 has been
used.
#lambda=-1
new_resp=1/train_WineData['alcohol']
train_WineData['new_resp'] = new_resp
MLR_wine =
ols(formula="new_resp~FA+VA+CA+RS+chloride+FSD+TSD+density+sulphate+pH+Seagram+SulaVineyards",
data=train_WineData).fit()
print(MLR_wine.summary())

OLS Regression Results

==============================================================================
Dep. Variable: new_resp R-squared: 0.658
Model: OLS Adj. R-squared: 0.655
Method: Least Squares F-statistic: 202.8
Date: Wed, 06 Jan 2021 Prob (F-statistic): 1.83e-284
Time: 14:57:09 Log-Likelihood: 4871.4
No. Observations: 1279 AIC: -9717.
Df Residuals: 1266 BIC: -9650.
Df Model: 12
Covariance Type: nonrobust
=================================================================================
coef std err t P>|t| [0.025 0.975]
[email protected]
---------------------------------------------------------------------------------
21YORICED7 Intercept -4.6667 0.134 -34.741 0.000 -4.930 -4.403
FA -0.0042 0.000 -20.435 0.000 -0.005 -0.004
VA -0.0040 0.001 -3.498 0.000 -0.006 -0.002
CA -0.0076 0.001 -5.582 0.000 -0.010 -0.005
RS -0.0023 0.000 -18.613 0.000 -0.002 -0.002
chloride 0.0164 0.004 4.357 0.000 0.009 0.024
FSD 1.723e-05 2.07e-05 0.832 0.406 -2.34e-05 5.79e-05
TSD 2.061e-05 6.91e-06 2.983 0.003 7.06e-06 3.42e-05
density 4.9303 0.138 35.752 0.000 4.660 5.201
sulphate -0.0098 0.001 -9.566 0.000 -0.012 -0.008
pH -0.0311 0.002 -20.252 0.000 -0.034 -0.028
Seagram 0.0020 0.000 4.738 0.000 0.001 0.003
SulaVineyards -0.0002 0.000 -0.579 0.563 -0.001 0.001
==============================================================================
Omnibus: 29.494 Durbin-Watson: 1.998
Prob(Omnibus): 0.000 Jarque-Bera (JB): 48.885
Skew: -0.185 Prob(JB): 2.42e-11
Kurtosis: 3.883 Cond. No. 7.67e+04
==============================================================================

Using the training data, we get the following regression equation:

1/𝐴𝑙𝑐𝑜ℎ𝑜𝑙 = −4.667 − 0.004 × 𝐹𝐴 − 0.004 × 𝑉𝐴 − 0.0076 × 𝐶𝐴 − 0.002 × 𝑅𝑆 + 0.016 × 𝑐ℎ𝑙𝑜𝑟𝑖𝑑𝑒

+ 0.00001 × 𝐹𝑆𝐷 + 0.00002 × 𝑇𝑆𝐷 + 4.93 × 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 − 0.0098 × 𝑠𝑢𝑙𝑝ℎ𝑎𝑡𝑒 − 0.031 × 𝑝𝐻 +
0.002 × 𝐼(𝐵𝑟𝑎𝑛𝑑 = 𝑆𝑒𝑎𝑔𝑟𝑎𝑚) − 0.0002 × 𝐼(𝐵𝑟𝑎𝑛𝑑 = 𝑆𝑢𝑙𝑎 𝑉𝑖𝑛𝑒𝑦𝑎𝑟𝑑𝑠)

Note that the predictors FSD and 𝐼(𝐵𝑟𝑎𝑛𝑑 = 𝑆𝑢𝑙𝑎 𝑉𝑖𝑛𝑒𝑦𝑎𝑟𝑑𝑠) are not significant in
predicting 1/alcohol.

17
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Earlier, the non-significant predictors were simply eliminated (see Regression monograph).
Now, more formal statistical methods will be used to decide which variable(s) should be
eliminated or retained in the final predictive model.

3.2 Information criteria: AIC and BIC

Once several candidate models are found, it is necessary to be able to compare among them.
There are several options for comparison. Two of those, namely R2 and adjusted R2, have
already been introduced (see Regression Monograph) and their merits and demerits have been
discussed.
In this section two more criteria are introduced both of which are based on information lost in
fitting a model. A model is a simplification of the process from which the observed data is
generated. The closer the model is to the real process; the more information it contains.
Nevertheless, no model will ever be able to emulate the real process and hence, some amount
of information will always be lost. The errors or residuals of the model fit provide
quantification of the information lost.
Both the information criteria, AIC and BIC, are popular for comparison of models. Both criteria
are based on the error sum of squares and both penalize models on the number of predictors
included. In a way, they try to strike a balance between bias and variance.
Consider the multiple linear regression model with 𝑝 parameters (including intercept
[email protected] term) on 𝑛
21YORICED7data points and let the residual sum of squares be denoted by SSE.

The Akaike Information Criteria (AIC) is defined by

𝐴𝐼𝐶 = 𝑛 log(𝑆𝑆𝐸) − 𝑛 log(𝑛) + 2𝑝
However, the AIC tends to overfit models. This criticism of AIC has led to the development of
BIC.
The Bayesian Information Criteria (BIC) is defined by
𝐵𝐼𝐶 = 𝑛 log(𝑆𝑆𝐸) – 𝑛 log(𝑛) + 𝑝 log (𝑛)
Since log(𝑛) is usually much larger than 2, it follows that BIC imposes greater penalty if the
number of parameters, p, is too large. Thus BIC maintains a greater balance in the number of
parameters used to fit the model. In general, BIC is preferred to AIC in model building exercises.
Naturally, the smaller the value of AIC or BIC, the better is the model, since the model with
minimum value of these criteria identifies the smallest quantity of information lost.

Next the alternative model selection procedures are discussed.

18
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
3.3 Forward Selection (FS) Algorithm

Forward selection algorithm for regression model building is an automated algorithm which
selects one predictor at a time, conditional on which other predictors are already included in
the model. This model is not expected to include the redundant predictors. Forward selection
algorithm suggests a minimal set of predictors following certain optimality criteria.
The forward selection algorithm is one of the most popular algorithm because of its easy
interpretability.
Assume there are p available predictors. The steps of forward selection algorithm are as
follows:

Start with a null (intercept-only) model.

Perform p simple linear regressions with one predictor at a time. For each model,
compute the AIC value. Include that predictor for which the corresponding model has
the smallest AIC value, smaller than the null model. If no such predictor is found, then
the null model is considered final.
 Assume that a certain predictor, say 𝑋1, did enter the model. At the end of this step the
model contains an intercept term and 𝑋1.
 Perform p-1 multiple linear regressions with the remaining p-1 predictors 𝑋2,…,𝑋𝑝,
 Check which predictor among 𝑋2,…,𝑋𝑝 gives smallest AIC in presence of 𝑋1, smaller
[email protected]
21YORICED7 than the AIC obtained with just 𝑋1 as predictor. That predictor is included in the model.
If no predictor among 𝑋2,…,𝑋𝑝 satisfies the entry criterion, then the process is stopped
and the final model includes 𝑋1 as the only predictor for 𝑌
 This process is continued till either all predictors are included in the regression or a
subset of the predictors is included and inclusion of no other predictor found to give
smaller AIC value than that of the model determined by this subset.
Forward selection is performed on the training dataset with response new_resp.

# forward selection
new_resp=1/train_WineData['alcohol']
train_WineData['new_resp'] = new_resp
train_X_WineData = train_WineData.drop(['alcohol','new_resp'],axis = 1)
def forwardSelection_aic(X, y):
X["intercept"] = 1
cols = X.columns.tolist()
cols = cols [-1:] + cols[:-1]
X = X[cols]

iterations_log = ""
cols = X.columns.tolist()

selected_cols = ["intercept"]
remaining_cols = cols.copy()
remaining_cols.remove("intercept")
print("Remaining columns : ",remaining_cols)
model = sm.OLS(y, X[selected_cols]).fit()
criteria = model.aic
print("Null model AIC critetia :",criteria)
19
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
while(len(remaining_cols) > 0):
aic_dict = {}
for i in range(len(remaining_cols)):
new_col = remaining_cols[i]
selected_cols.append(new_col)
model = sm.OLS(y, X[selected_cols]).fit()
new_criteria = model.aic
if new_criteria < criteria:
aic_dict[new_col] = new_criteria
selected_cols.remove(new_col)
if len(aic_dict) == 0:
break
entered_col = sorted(aic_dict.items(), key=lambda x: x[1])[0][0]
print()
print("Entered column :",entered_col)
selected_cols.append(entered_col)
print("Selected columns :",selected_cols)
remaining_cols.remove(entered_col)
print("Remaining columns : ",remaining_cols)
model = sm.OLS(y, X[selected_cols]).fit()
criteria = model.aic
print("New AIC critetia :",criteria)
print("\n\n")
print("Final selected columns :",selected_cols)
print("Final removed columns : ",remaining_cols)
return selected_cols

frwd_slctn_cols = forwardSelection_aic(train_X_WineData,train_WineData['new_resp'
])

Remaining columns : ['FA', 'VA', 'CA', 'RS', 'chloride', 'FSD', 'TSD', 'density'
, 'pH', 'sulphate', 'Seagram', 'SulaVineyards']
[email protected]
Null model AIC critetia : -8369.496080838044
21YORICED7
Entered column : density
Selected columns : ['intercept', 'density']
Remaining columns : ['FA', 'VA', 'CA', 'RS', 'chloride', 'FSD', 'TSD', 'pH', 'su
lphate', 'Seagram', 'SulaVineyards']
New AIC critetia : -8686.871678269768

Entered column : FA
Selected columns : ['intercept', 'density', 'FA']
Remaining columns : ['VA', 'CA', 'RS', 'chloride', 'FSD', 'TSD', 'pH', 'sulphate
', 'Seagram', 'SulaVineyards']
New AIC critetia : -8896.704471654608

Entered column : pH
Selected columns : ['intercept', 'density', 'FA', 'pH']
Remaining columns : ['VA', 'CA', 'RS', 'chloride', 'FSD', 'TSD', 'sulphate', 'Se
agram', 'SulaVineyards']
New AIC critetia : -9202.05736648084

Entered column : RS
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS']
Remaining columns : ['VA', 'CA', 'chloride', 'FSD', 'TSD', 'sulphate', 'Seagram'
, 'SulaVineyards']
New AIC critetia : -9509.221077244645

Entered column : sulphate

Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate']
Remaining columns : ['VA', 'CA', 'chloride', 'FSD', 'TSD', 'Seagram', 'SulaViney
ards']
New AIC critetia : -9625.470158323134

Entered column : Seagram

20
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagra
m']
Remaining columns : ['VA', 'CA', 'chloride', 'FSD', 'TSD', 'SulaVineyards']
New AIC critetia : -9674.361041210175

Entered column : FSD

Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagra
m', 'FSD']
Remaining columns : ['VA', 'CA', 'chloride', 'TSD', 'SulaVineyards']
New AIC critetia : -9685.905253086048

Entered column : CA
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagra
m', 'FSD', 'CA']
Remaining columns : ['VA', 'chloride', 'TSD', 'SulaVineyards']
New AIC critetia : -9696.160147228982

Entered column : chloride

Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagra
m', 'FSD', 'CA', 'chloride']
Remaining columns : ['VA', 'TSD', 'SulaVineyards']
New AIC critetia : -9704.733713671727

Entered column : VA
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagra
m', 'FSD', 'CA', 'chloride', 'VA']
Remaining columns : ['TSD', 'SulaVineyards']
New AIC critetia : -9711.47318715933

Entered column : TSD

Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagra
m', 'FSD', 'CA', 'chloride', 'VA', 'TSD']
Remaining columns : ['SulaVineyards']
[email protected]
New AIC critetia : -9718.548176185843
21YORICED7

Final selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', '
Seagram', 'FSD', 'CA', 'chloride', 'VA', 'TSD']
Final removed columns : ['SulaVineyards']

Start with the null model (null_mod) that has only the intercept term as the predictor. If at any stage,
it is seen that inclusion of a new predictor does not improve the AIC value, the process stops.
The null model has AIC −8369.496. Starting with null model, density has the smallest AIC value.
The model containing density has AIC value −8686.871, which is an improvement on the
previous AIC value of −8369.496. Therefore, density enters the model.
Similarly, once density is present in the model, the best improvement in AIC value is obtained by
including FA. So at the next step, the model includes both density and FA. In this way, the process
continues till the full model is obtained.
For this data set, all predictors except “SulaVineyards” enter the model through forward selection
procedure.
Therefore, the regression equation also gets updated as the feature “SulaVineyards” will be
removed from the equation. The regression equation will look as follows :
MLR_wine1 =
ols(formula="new_resp~FA+VA+CA+RS+chloride+FSD+TSD+density+sulphate+pH+Seagram",data=train_
WineData).fit()

21
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
print(MLR_wine1.summary())

OLS Regression Results

==============================================================================
Dep. Variable: new_resp R-squared: 0.658
Model: OLS Adj. R-squared: 0.655
Method: Least Squares F-statistic: 221.3
Date: Sun, 10 Jan 2021 Prob (F-statistic): 1.42e-285
Time: 15:37:00 Log-Likelihood: 4871.3
No. Observations: 1279 AIC: -9719.
Df Residuals: 1267 BIC: -9657.
Df Model: 11
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -4.6612 0.134 -34.798 0.000 -4.924 -4.398
FA -0.0042 0.000 -20.447 0.000 -0.005 -0.004
VA -0.0040 0.001 -3.485 0.001 -0.006 -0.002
CA -0.0076 0.001 -5.559 0.000 -0.010 -0.005
RS -0.0022 0.000 -18.620 0.000 -0.002 -0.002
chloride 0.0163 0.004 4.341 0.000 0.009 0.024
FSD 1.621e-05 2.06e-05 0.786 0.432 -2.43e-05 5.67e-05
TSD 2.073e-05 6.9e-06 3.004 0.003 7.19e-06 3.43e-05
density 4.9243 0.137 35.822 0.000 4.655 5.194
sulphate -0.0098 0.001 -9.554 0.000 -0.012 -0.008
pH -0.0310 0.002 -20.286 0.000 -0.034 -0.028
Seagram 0.0022 0.000 6.264 0.000 0.001 0.003
==============================================================================
Omnibus: 29.768 Durbin-Watson: 1.999
Prob(Omnibus): 0.000 Jarque-Bera (JB): 49.688
Skew: -0.185 Prob(JB): 1.62e-11
[email protected]
21YORICED7 Kurtosis: 3.892 Cond. No. 7.65e+04
==============================================================================

1/𝐴𝑙𝑐𝑜ℎ𝑜𝑙 = −4.661 − 0.0042 × 𝐹𝐴 − 0.004 × 𝑉𝐴 − 0.0076 × 𝐶𝐴 − 0.0022× 𝑅𝑆 + 0.0163 ×

𝑐ℎ𝑙𝑜𝑟𝑖𝑑𝑒 + 0.000016 × 𝐹𝑆𝐷 + 0.00002 × 𝑇𝑆𝐷 + 4.924 × 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 − 0.0098 × 𝑠𝑢𝑙𝑝ℎ𝑎𝑡𝑒 − 0.031
× 𝑝𝐻 + 0.0022 × 𝐼(𝐵𝑟𝑎𝑛𝑑 = 𝑆𝑒𝑎𝑔𝑟𝑎𝑚)

3.4 Backward Elimination (BE) Algorithm

Forward selection includes the predictors one by one into the model. Backward elimination is
an algorithm that goes the opposite way. The algorithm is described below.

 Start with the full model.

 Remove each predictor from the model and compute the AIC value of each of these
new sub-models. That sub-model giving smallest AIC value, smaller than the AIC value
of the full model, is selected. That is, the predictor whose absence created this selected
sub-model, is eliminated. If this condition is not satisfied, the process stops and the full
model is chosen to be the final model.
 Suppose at the first step a certain variable X1 is eliminated from the model. The new (p
- 1) - variable model is considered again and the variable showing smallest AIC value
at that stage is removed from the model if the AIC value falls below even that of the (p
−1) - variable model. Again if no variable is removed, conclude that the p − 1-variable

22
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
model is the final model.
 Continue until there is no predictor left in the model, i.e. the model becomes a null model
or if there is no further improvement in AIC values.


The code and output for backward elimination procedure are given below. The process starts
with the full model.

It was seen that the variable satisfying this condition was SulaVineyards. The AIC of the model
obtained by removing SulaVineyards is −9718.548 which is smaller than AIC of the full model
(−9716.88). Therefore, SulaVineyards was eliminated at the first stage. Similarly, in second
stage, the AIC of the model obtained by removing FSD is −9719.924 which is smaller than AIC
of the previous model (−9718.548). However, after removing FSD, no other predictor satisfied
the exit criterion. The process stopped, keeping all the predictors except SulaVineyards & FSD
in the model.

def backwardElimination_aic(X, y):

X["intercept"] = 1
cols = X.columns.tolist()
cols = cols[-1:] + cols[:-1]
X = X[cols]

iterations_log = ""
cols = X.columns.tolist()

selected_cols = cols
remaining_cols = cols.copy()
remaining_cols.remove("intercept")
[email protected]
model = sm.OLS(y, X[selected_cols]).fit()
21YORICED7
criteria = model.aic
print("Initial AIC critetia :",criteria)

while(True):
rmvd_col_dict = {}
for i in range(len(remaining_cols)):
temp_rmvd_col = remaining_cols[i]
selected_cols.remove(temp_rmvd_col)
model = sm.OLS(y, X[selected_cols]).fit()
new_criteria = model.aic
selected_cols.append(temp_rmvd_col)
if new_criteria < criteria:
rmvd_col_dict[temp_rmvd_col] = new_criteria
if len(rmvd_col_dict) == 0:
break
print()
print("Columns eligible for elimination:",rmvd_col_dict)
eliminated_col = sorted(rmvd_col_dict.items(), key=lambda x: x[1])[0][0]
print("Removed col with least AIC:",eliminated_col)
selected_cols.remove(eliminated_col)
remaining_cols.remove(eliminated_col)
model = sm.OLS(y, X[selected_cols]).fit()
criteria = model.aic
print("Updated AIC criteria : ",criteria)
print(selected_cols)

print("\n\n")
print("Final selected columns :",selected_cols)
return selected_cols

bckwrd_elmtn_cols = backwardElimination_aic(train_X_WineData,train_WineData['new_resp'])

23
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Initial AIC critetia : -9716.886654499593

Columns eligible for elimination: {'FSD': -9718.187479229922, 'SulaVineyards': -

9718.548176185837}
Removed col with least AIC: SulaVineyards
Updated AIC criteria : -9718.548176185837
['intercept', 'FA', 'VA', 'CA', 'RS', 'chloride', 'FSD', 'TSD', 'density', 'pH',
'sulphate', 'Seagram']

Columns eligible for elimination: {'FSD': -9719.924834061516}

Removed col with least AIC: FSD
Updated AIC criteria : -9719.92483406152
['intercept', 'FA', 'VA', 'CA', 'RS', 'chloride', 'TSD', 'density', 'pH', 'sulphate',
'Seagram']

Final selected columns : ['intercept', 'FA', 'VA', 'CA', 'RS', 'chloride', 'TSD',
'density', 'pH', 'sulphate', 'Seagram']

Backward elimination suggests that FA, VA, CA, RS, chloride, TSD, density, sulphate, pH, and
Seagram are enough to be used as predictors and in their presence the variable FSD and
SulaVineyards is redundant
MLR_wine1 =
ols(formula="new_resp~FA+VA+CA+RS+chloride+TSD+density+sulphate+pH+Seagram",data=train_Wine
Data).fit()
print(MLR_wine1.summary())

OLS Regression Results

==============================================================================
Dep. Variable: new_resp R-squared: 0.658
[email protected]
Model: OLS Adj. R-squared: 0.655
21YORICED7 Method: Least Squares F-statistic: 243.4
Date: Sun, 10 Jan 2021 Prob (F-statistic): 1.21e-286
Time: 15:53:56 Log-Likelihood: 4871.0
No. Observations: 1279 AIC: -9720.
Df Residuals: 1268 BIC: -9663.
Df Model: 10
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -4.6515 0.133 -34.877 0.000 -4.913 -4.390
FA -0.0042 0.000 -20.494 0.000 -0.005 -0.004
VA -0.0041 0.001 -3.661 0.000 -0.006 -0.002
CA -0.0078 0.001 -5.806 0.000 -0.010 -0.005
RS -0.0022 0.000 -18.700 0.000 -0.002 -0.002
chloride 0.0165 0.004 4.418 0.000 0.009 0.024
TSD 2.439e-05 5.1e-06 4.779 0.000 1.44e-05 3.44e-05
density 4.9141 0.137 35.913 0.000 4.646 5.182
sulphate -0.0097 0.001 -9.532 0.000 -0.012 -0.008
pH -0.0309 0.002 -20.365 0.000 -0.034 -0.028
Seagram 0.0022 0.000 6.225 0.000 0.001 0.003
==============================================================================
Omnibus: 30.227 Durbin-Watson: 1.999
Prob(Omnibus): 0.000 Jarque-Bera (JB): 50.976
Skew: -0.184 Prob(JB): 8.52e-12
Kurtosis: 3.906 Cond. No. 7.30e+04
==============================================================================

24
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
The regression equation becomes

1/𝐴𝑙𝑐𝑜ℎ𝑜𝑙 = −4.65 − 0.0042 × 𝐹𝐴 − 0.0041 × 𝑉𝐴 − 0.0078 × 𝐶𝐴 − 0.0022× 𝑅𝑆 + 0.0165 ×

𝑐ℎ𝑙𝑜𝑟𝑖𝑑𝑒 + 0.000024 × 𝑇𝑆𝐷 + 4.924 × 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 − 0.0097 × 𝑠𝑢𝑙𝑝ℎ𝑎𝑡𝑒 − 0.0309 × 𝑝𝐻 + 0.0022 ×
𝐼(𝐵𝑟𝑎𝑛𝑑 = 𝑆𝑒𝑎𝑔𝑟𝑎𝑚)

Several important points need to be noted here.

1. At every stage of FS or BE procedure, the candidate predictor to enter or to be removed,

depends on the set of predictors already in the model. Both these procedures may be
modified and one or more predictors may be forced to be included in the model, even if
they are not otherwise significant. Under such condition, the final model will be
different than if the process is continued without any condition.
2. There is no guarantee that FS and BE will propose identical model. The current data set
is an example of that.
3. While the final model is developed using AIC, none of the other model optimality
criteria, such as multiple R2 or adjusted R2 is used. Alternative criteria, such as BIC or
adjusted R2 can also be used to control entry and exit of predictors. Use of different
criteria will result in different models.

3.5 Stepwise Regression

[email protected]
Stepwise regression is a combination of Forward Selection and Backward Elimination
21YORICED7
procedures.
 Start with the null model.
 At the next step, use Forward Selection to select a predictor showing smallest AIC value,
smaller than that of the null model. If this criterion is not satisfied, then the null model is
selected to be the final model.
 Suppose a predictor 𝑋1 entered the model. For each of the remaining predictors, include it,
compute AIC. Remove 𝑋1 from the model and also compute the AIC value. Print the variables
in increasing order of AIC values. If there is a variable whose entry/exit improves the AIC of
the existing model, include/eliminate it from the model.
 Continue till all predictors are included or there is no further improvement of AIC.
It was seen that the outcome of Stepwise Regression was identical to that of Backward
Elimination, i.e. only SulaVineyards & FSD was removed from the set of predictors to build the
final model.
def combine_fs_be_stepaic(X,y):
X["intercept"] = 1
cols = X.columns.tolist()
cols = cols[-1:] + cols[:-1]
X = X[cols]

iterations_log = ""
cols = X.columns.tolist()

selected_cols = ["intercept"]
remaining_cols = cols.copy()

25
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
remaining_cols.remove("intercept")
print("Remaining columns : ",remaining_cols)
model = sm.OLS(y, X[selected_cols]).fit()
criteria = model.aic
print("Null model AIC critetia :",criteria)

temp_dict = {}
for i in range(len(remaining_cols)):
new_col = remaining_cols[i]
selected_cols.append(new_col)
model = sm.OLS(y, X[selected_cols]).fit()
new_criteria = model.aic
if new_criteria < criteria:
temp_dict[new_col] = new_criteria
selected_cols.remove(new_col)
entered_col = sorted(temp_dict.items(), key=lambda x: x[1])[0][0]
entered_aic = sorted(temp_dict.items(), key=lambda x: x[1])[0][1]
print()
print("Entered column :",entered_col)
selected_cols.append(entered_col)
temp_dict = {}
temp_dict[entered_col] = entered_aic
first_stage = True
print("Selected columns :",selected_cols)
remaining_cols.remove(entered_col)
print(temp_dict)

while(True):
if first_stage:
aic_dict = temp_dict
else:
[email protected]
21YORICED7 first_stage = False
aic_dict = {}
flag = False
be_remaining_cols = selected_cols.copy()
be_selected_cols = selected_cols.copy()
be_remaining_cols.remove('intercept')
for i in range(len(be_remaining_cols)):
temp_rmvd_col = be_remaining_cols[i]
be_selected_cols.remove(temp_rmvd_col)
model = sm.OLS(y, X[be_selected_cols]).fit()
new_criteria = model.aic
be_selected_cols.append(temp_rmvd_col)
if new_criteria < criteria:
aic_dict[temp_rmvd_col] = new_criteria
flag = True
print()
col_elig_fr_elim = None
if flag :
col_elig_fr_elim = sorted(aic_dict.items(), key=lambda x: x[1])[0][0]
print("Column that is eligible for elimination:",col_elig_fr_elim)
else:
print('No colums eligible for BE in this stage')

for i in range(len(remaining_cols)):
new_col = remaining_cols[i]
selected_cols.append(new_col)
model = sm.OLS(y, X[selected_cols]).fit()
new_criteria = model.aic
if new_criteria < criteria:
aic_dict[new_col] = new_criteria

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
selected_cols.remove(new_col)
lowest_aic_col = sorted(aic_dict.items(), key=lambda x: x[1])[0][0]
prev_criteria = criteria
if lowest_aic_col == col_elig_fr_elim:
selected_cols.remove(lowest_aic_col)
remaining_cols.append(lowest_aic_col)
model = sm.OLS(y, X[selected_cols]).fit()
criteria = model.aic
if prev_criteria<=criteria:
selected_cols.append(lowest_aic_col)
remaining_cols.remove(lowest_aic_col)
break
print("Eliminated column :",lowest_aic_col)
else:
selected_cols.append(lowest_aic_col)
remaining_cols.remove(lowest_aic_col)
model = sm.OLS(y, X[selected_cols]).fit()
criteria = model.aic
if prev_criteria<=criteria:
selected_cols.remove(lowest_aic_col)
remaining_cols.append(lowest_aic_col)
break
print("Entered column :",lowest_aic_col)
print("Selected columns :",selected_cols)
print("Update AIC criteria : ",criteria)

return selected_cols

combine_fs_be_stepaic(train_X_WineData,train_WineData['new_resp'])
[email protected]
21YORICED7 Remaining columns : ['FA', 'VA', 'CA', 'RS', 'chloride', 'FSD', 'TSD', 'density', 'pH', 'su
lphate', 'Seagram', 'SulaVineyards']
Null model AIC critetia : -8369.496080838044

Entered column : density

Selected columns : ['intercept', 'density']
{'density': -8686.871678269768}

No colums eligible for BE in this stage

Entered column : FA
Selected columns : ['intercept', 'density', 'FA']
Update AIC criteria : -8896.704471654608

No colums eligible for BE in this stage

Entered column : pH
Selected columns : ['intercept', 'density', 'FA', 'pH']
Update AIC criteria : -9202.05736648084

No colums eligible for BE in this stage

Entered column : RS
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS']
Update AIC criteria : -9509.221077244645

No colums eligible for BE in this stage

Entered column : sulphate
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate']
Update AIC criteria : -9625.470158323134

No colums eligible for BE in this stage

Entered column : Seagram
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram']
Update AIC criteria : -9674.361041210175

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
No colums eligible for BE in this stage
Entered column : FSD
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', 'FSD']
Update AIC criteria : -9685.905253086048

No colums eligible for BE in this stage

Entered column : CA
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', 'FSD',
'CA']
Update AIC criteria : -9696.160147228982

No colums eligible for BE in this stage

Entered column : chloride
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', 'FSD',
'CA', 'chloride']
Update AIC criteria : -9704.733713671727

No colums eligible for BE in this stage

Entered column : VA
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', 'FSD',
'CA', 'chloride', 'VA']
Update AIC criteria : -9711.47318715933

No colums eligible for BE in this stage

Entered column : TSD
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', 'FSD',
'CA', 'chloride', 'VA', 'TSD']
Update AIC criteria : -9718.548176185843

Column that is eligible for elimination: FSD

Eliminated column : FSD
Selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', 'CA', '
[email protected]
chloride', 'VA', 'TSD']
21YORICED7 Update AIC criteria : -9719.924834061512

Final selected columns : ['intercept', 'density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', '
CA', 'chloride', 'VA', 'TSD']

The regression equation therefore becomes

1/𝐴𝑙𝑐𝑜ℎ𝑜𝑙 = −4.65 − 0.0042 × 𝐹𝐴 − 0.0041 × 𝑉𝐴 − 0.0078 × 𝐶𝐴 − 0.0022× 𝑅𝑆 + 0.0165 ×

No further discussion with Stepwise Regression is taken up since its output is identical to that
of Backward Elimination. All computations performed with the model obtained from
Backward Elimination will be identical with the model obtained from Stepwise Regression.

3.6 All Possible Regression or Regression Subset Selection and Mallows’ Cp

Criterion

The statistic known as the Mallows' Cp criterion is useful to measure bias in a multiple linear
regression setting.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Consider as before a multiple linear regression model where the response is 𝑌 and there exists
a set of predictors {𝑋1, 𝑋2, …, 𝑋𝑘}. With k predictors, 2𝑘 different multiple linear regression
models may be fit. For example, if 𝑘 = 10, 210 = 1024 different multiple linear regression models
(including the null model) may be fit.

If a set of p-1 predictors is chosen out of this set so that there are in total p parameters (including
intercept), then Mallows’ Cp is computed by
𝐶𝑝=[𝑆𝑆𝐸𝑝/𝑀𝑆𝐸𝑎𝑙𝑙] − (𝑛−2𝑝)
where n is the number of observations in the data set. Here 𝑆𝑆𝐸𝑝 is the residual sum of squares
obtained when the response is modelled with the p-1 chosen predictors.
If the full model is fitted, using all the 𝑘 predictors, then there will be 𝑘+1 parameters. Then
𝑀𝑆𝐸𝑎𝑙𝑙 or Mean Squared Error for the full model is computed by dividing the residual sum of
squares (SSE) by 𝑛−(𝑘+1).
If the model works well, the numerical value of 𝐶𝑝 is expected to be close to 𝑝. For a potentially
good model 𝐶𝑝 ≤ p. The full model always has Mallows’ Cp equal to 𝑘+1. Aim is to select the
smallest 𝑝 and that subset of predictors for which Cp is smaller than but closest to p.

This procedure is also known as All Possible Regressions or Regression Subset Selection.
However, for even a decent sized k, all 2𝑘 subset of predictors are not possible to consider.
Typically, only a few models at various values of k are compared.

One advantage of this procedure is that, the models are not conditional on what predictors are
[email protected]
21YORICED7already included in the model. For the algorithms discussed previously, i.e. FS, BE or stepwise
regression, the next predictor to be included or eliminated, depends on what other predictors
are already included in the model. In All Possible Regression method, there does not exist any
such dependency.

import os
from sklearn.linear_model import LinearRegression
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

X=train_X_WineData
y=train_WineData['new_resp']
# Perform an Exhaustive Search. The EFS and SFS packages use 'neg_mean_squared_error'. The
'mean_squared_error' seems to have been deprecated. I think this is just the MSE with the
a negative sign.
lr = LinearRegression()
efs1 = EFS(lr,
min_features=1,
max_features=13,
scoring='neg_mean_squared_error',
print_progress=True,
cv=5)
# Create a efs fit
efs1 = efs1.fit(X.values, y.values)
print('Best negtive mean squared error: %.2f' % efs1.best_score_)
## Print the IDX of the best features
print('Best subset:', efs1.best_idx_)
Features: 8191/8191

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Best negtive mean squared error: -0.00
Best subset: (0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 12)

X.columns[[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 12]]

Index(['FA', 'VA', 'CA', 'RS', 'chloride', 'TSD', 'density', 'pH', 'sulphate',
'Seagram', 'intercept'], dtype='object')

There are 12 predictors (treating BrandSeagram and BrandSula Vineyards as separate

predictors), and so, along with the intercept term, k = 13. The model with 1 parameter is the
null model containing only the intercept term, and the model with 13 parameters contains all
the predictors along with the intercept term.
At every choice of the number of predictors (ranging from 1 to 12), the ‘best’ model according
to Mallows’ Cp criterion is calculated.
The best model with 11 predictors is therefore given by the model that includes all variables
and intercept except FSD and Brand SulaVineyards.

MLR_wine1 =
ols(formula="new_resp~FA+VA+CA+RS+chloride+TSD+density+sulphate+pH+Seagram",data=train_W
ineData).fit()
print(MLR_wine1.summary())
OLS Regression Results
==============================================================================
Dep. Variable: new_resp R-squared: 0.658
Model: OLS Adj. R-squared: 0.655
Method:
[email protected] Least Squares F-statistic: 243.4
21YORICED7 Date: Sun, 10 Jan 2021 Prob (F-statistic): 1.21e-286
Time: 15:53:56 Log-Likelihood: 4871.0
No. Observations: 1279 AIC: -9720.
Df Residuals: 1268 BIC: -9663.
Df Model: 10
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -4.6515 0.133 -34.877 0.000 -4.913 -4.390
FA -0.0042 0.000 -20.494 0.000 -0.005 -0.004
VA -0.0041 0.001 -3.661 0.000 -0.006 -0.002
CA -0.0078 0.001 -5.806 0.000 -0.010 -0.005
RS -0.0022 0.000 -18.700 0.000 -0.002 -0.002
chloride 0.0165 0.004 4.418 0.000 0.009 0.024
TSD 2.439e-05 5.1e-06 4.779 0.000 1.44e-05 3.44e-05
density 4.9141 0.137 35.913 0.000 4.646 5.182
sulphate -0.0097 0.001 -9.532 0.000 -0.012 -0.008
pH -0.0309 0.002 -20.365 0.000 -0.034 -0.028
Seagram 0.0022 0.000 6.225 0.000 0.001 0.003
==============================================================================
Omnibus: 30.227 Durbin-Watson: 1.999
Prob(Omnibus): 0.000 Jarque-Bera (JB): 50.976
Skew: -0.184 Prob(JB): 8.52e-12
Kurtosis: 3.906 Cond. No. 7.30e+04
==============================================================================

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
The regression equation becomes:

1/𝐴𝑙𝑐𝑜ℎ𝑜𝑙 = −4.65 − 0.0042 × 𝐹𝐴 − 0.0041 × 𝑉𝐴 − 0.0078 × 𝐶𝐴 − 0.0022× 𝑅𝑆 + 0.0165 ×

𝑐ℎ𝑙𝑜𝑟𝑖𝑑𝑒 + 0.000024 × 𝑇𝑆𝐷 + 4.924 × 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 − 0.0097 × 𝑠𝑢𝑙𝑝ℎ𝑎𝑡𝑒 − 0.0309 × 𝑝𝐻 + 0.0022 ×
𝐼(𝐵𝑟𝑎𝑛𝑑 = 𝑆𝑒𝑎𝑔𝑟𝑎𝑚)
We can see that all the predictors are significant.

Note again that, all possible regression mechanism gives a third choice of model to predict
alcohol (%).

3.7 Choosing the Final Model

Once one (or more) candidate models are identified using the training data, the final step in
model selection procedure is validation on test data. In this case, all four models are considered
for validation.
There are multiple criteria for model validation. The one chosen criterion is RMSE because of
its easy interpretation.
from sklearn.metrics import mean_squared_error
# Forward selection
cols = ['density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', 'FSD', 'CA', 'chloride', 'VA',
'TSD']
X = train_X_WineData[cols]
X_test = test_WineData[cols]
y = train_WineData['new_resp']
y_test1 = 1/y_test
[email protected]
MLR_wine_fs = LinearRegression().fit(X,y)
21YORICED7 y_pred = MLR_wine_fs.predict(X_test)
print("Forward Selection MSE :",np.sqrt(mean_squared_error(y_test1,y_pred)))
Forward Selection MSE : 0.005322077111287733

#Backward elimination / Both combination / Mallows Cp

cols = ['density', 'FA', 'pH', 'RS', 'sulphate', 'Seagram', 'CA', 'chloride', 'VA', 'TSD']
X = train_X_WineData[cols]
X_test = test_WineData[cols]
y = train_WineData['new_resp']
MLR_wine_be = LinearRegression().fit(X,y)
y_pred = MLR_wine_be.predict(X_test)
print("Backward Elimination / Both combination / Mallows Cp MSE
:",np.sqrt(mean_squared_error(y_test1,y_pred)))
Backward Elimination / Both combination / Mallows Cp MSE : 0.005317362696545827

As we know that Backward elimination, combination of backward & forward as well as Mallows
Cp results in same features as the best features, we have calculated same RMSE for all 3 as shown
above.
Both RMSE values are again comparable, but this time, the model obtained from Backward
elimination, combination of backward & forward as well as Mallows Cp beats the other
forward selection model.
Hence, we choose the model from Backward Elimination as our final model.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
The final regression equation is

1/𝐴𝑙𝑐𝑜ℎ𝑜𝑙 = −4.65 − 0.0042 × 𝐹𝐴 − 0.0041 × 𝑉𝐴 − 0.0078 × 𝐶𝐴 − 0.0022× 𝑅𝑆 + 0.0165 ×

Note that the RMSE for this model is 0.00531 on the test data, and the RMSE for the same model
on the train data is approximately 0.00537.

y_train_pred = MLR_wine_be.predict(X)
print("Train MSE :",np.sqrt(mean_squared_error(y,y_train_pred)))
Train MSE : 0.005367713537929523

The comparability of these two values suggests that our final model is unbiased and hence a
good fit.

[email protected]
21YORICED7

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Flowchart for Regression Model Selection

Split data into training/test or prepare for cross-validation

Follow all the initial steps for MLR in training data

No
LINE
assumptions Transform Y
satisfied?

Yes

Adopt regression model building procedure(s)

[email protected]
21YORICED7

Compare candidate models: AIC, BIC, Cp, PRESS etc

Choose one or more candidate models

Validate model(s) on test data

Choose final model

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
References
Draper, N. R., Smith, H. (1998). Applied Regression Analysis. Wiley Series in Probability
and Statistics.
Neter, J., Wasserman, W., Kutner, M. H. (1983). Applied Linear Regression Models. Richard
D. Irwin, Inc.
Seber, G. A. F., Lee, A. J. (2003). Linear Regression Analysis. Wiley Series in Probability
and Statistics.
https://newonlinecourses.science.psu.edu/stat501/node/2/
https://online-learning.harvard.edu/course/data-science-linear-regression

[email protected]
21YORICED7

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.

Capstone - Project - Final - Report - Churn - Prediction
100% (3)
Capstone - Project - Final - Report - Churn - Prediction
28 pages
14.predictive Modeling Using Logistic Regression.2007
No ratings yet
14.predictive Modeling Using Logistic Regression.2007
266 pages
Predictive Modeling
No ratings yet
Predictive Modeling
8 pages
Predictive Analys
No ratings yet
Predictive Analys
34 pages
Predictive Modeling Lecture Notes 1
No ratings yet
Predictive Modeling Lecture Notes 1
11 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Predictive Analysis 1
No ratings yet
Predictive Analysis 1
22 pages
Predictive Modelling-Week-1
No ratings yet
Predictive Modelling-Week-1
39 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Machine Learning Lecture1
No ratings yet
Machine Learning Lecture1
56 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
17 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Predictive Modeling: Types, Benefits, and Algorithms
No ratings yet
Predictive Modeling: Types, Benefits, and Algorithms
4 pages
R Data Analysis
No ratings yet
R Data Analysis
10 pages
DCSN 216 Summary
No ratings yet
DCSN 216 Summary
19 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Unit-5 Bda
No ratings yet
Unit-5 Bda
21 pages
Statistical Prediction and Machine Learning
100% (2)
Statistical Prediction and Machine Learning
314 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
DSR Notes 3 To 5
No ratings yet
DSR Notes 3 To 5
70 pages
REGRESSION
No ratings yet
REGRESSION
13 pages
CH5 Predictive Modelling
No ratings yet
CH5 Predictive Modelling
9 pages
DCSN 216 Summary
No ratings yet
DCSN 216 Summary
23 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Key Data Mining Tasks: 1. Descriptive Analytics
No ratings yet
Key Data Mining Tasks: 1. Descriptive Analytics
10 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
ML+LVC+1+Post-Session+Summary
No ratings yet
ML+LVC+1+Post-Session+Summary
15 pages
Week 3 Lecture Slides BUS265 2023
No ratings yet
Week 3 Lecture Slides BUS265 2023
41 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Predictive Modelling
No ratings yet
Predictive Modelling
9 pages
Predictive Analytics - Regression
No ratings yet
Predictive Analytics - Regression
27 pages
KNN and Baysian Method
No ratings yet
KNN and Baysian Method
43 pages
Sample Mini Project
No ratings yet
Sample Mini Project
24 pages
Regression
No ratings yet
Regression
24 pages
Unit1 ML
No ratings yet
Unit1 ML
15 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Predictive Modelling Process: A First Tour
No ratings yet
Predictive Modelling Process: A First Tour
11 pages
Lecture+Notes+Model+ Selection PDF
No ratings yet
Lecture+Notes+Model+ Selection PDF
12 pages
Supervised Machine Learning - Linear Regression
No ratings yet
Supervised Machine Learning - Linear Regression
92 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
Unit 7 - Introduction To Predictive Analytics
No ratings yet
Unit 7 - Introduction To Predictive Analytics
10 pages
MFA-106-Unit IV Predictive Modelling and Analysis-21may2024
No ratings yet
MFA-106-Unit IV Predictive Modelling and Analysis-21may2024
10 pages
University Institute of Computing: Big Data Analytics 22CAH-782
No ratings yet
University Institute of Computing: Big Data Analytics 22CAH-782
27 pages
Unit-3
No ratings yet
Unit-3
53 pages
Design of Credit Model Design IIM Fintech Abrg
No ratings yet
Design of Credit Model Design IIM Fintech Abrg
13 pages
Ba Unit 4 - Part1
No ratings yet
Ba Unit 4 - Part1
7 pages
ML Unit II Modelling Notes
No ratings yet
ML Unit II Modelling Notes
18 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Predictive_Analytics 1
No ratings yet
Predictive_Analytics 1
35 pages
Microprediction: Building an Open AI Network
From Everand
Microprediction: Building an Open AI Network
Peter Cotton
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
VN BAN Cou 2 C P Sig Formatted 4-November-24 09-47-49
No ratings yet
VN BAN Cou 2 C P Sig Formatted 4-November-24 09-47-49
586 pages
SG - BAN - Cou - 6 - P - Formatted 4-November-24 09-26-24
No ratings yet
SG - BAN - Cou - 6 - P - Formatted 4-November-24 09-26-24
297 pages
SG - BAN - Cou - 5 - P - Formatted 4-November-24 09-25-09
No ratings yet
SG - BAN - Cou - 5 - P - Formatted 4-November-24 09-25-09
297 pages
TH - BAN - Cou - 1 - P - Formatted 4-November-24 09-32-30
No ratings yet
TH - BAN - Cou - 1 - P - Formatted 4-November-24 09-32-30
507 pages
Thailand - SIG - BAN1 - Formatted 9-October-24 22-10-25
No ratings yet
Thailand - SIG - BAN1 - Formatted 9-October-24 22-10-25
1,096 pages
TH - BAN - Cou - 3 - P - Formatted 4-November-24 09-35-16
No ratings yet
TH - BAN - Cou - 3 - P - Formatted 4-November-24 09-35-16
480 pages
SEA - P - BAN1 - Formatted 9-October-24 21-51-04
No ratings yet
SEA - P - BAN1 - Formatted 9-October-24 21-51-04
505 pages
SG - BAN - Cou - 5 - C - P - Sig - Formatted 4-November-24 09-24-33
No ratings yet
SG - BAN - Cou - 5 - C - P - Sig - Formatted 4-November-24 09-24-33
476 pages
TikTok - Formating - TOTAL - INDEX - Formatted 9-October-24 22-22-24 - 09oct2024 - 10 - 33 - 13 PM
No ratings yet
TikTok - Formating - TOTAL - INDEX - Formatted 9-October-24 22-22-24 - 09oct2024 - 10 - 33 - 13 PM
1,008 pages
TH - BAN - Cou - 4 - C - P - Sig - Formatted 4-November-24 09-36-31
No ratings yet
TH - BAN - Cou - 4 - C - P - Sig - Formatted 4-November-24 09-36-31
502 pages
Singapore - SIG - BAN2 - Formatted 9-October-24 22-01-03
No ratings yet
Singapore - SIG - BAN2 - Formatted 9-October-24 22-01-03
1,833 pages
Philippines - SIG - BAN1 - Formatted 9-October-24 21-46-52
No ratings yet
Philippines - SIG - BAN1 - Formatted 9-October-24 21-46-52
1,126 pages
BAN1 - PGF - SGMY - P - Formatted 4-November-24 08-55-37
No ratings yet
BAN1 - PGF - SGMY - P - Formatted 4-November-24 08-55-37
470 pages
Japan - P - BAN1 - Formatted 9-October-24 21-26-38
No ratings yet
Japan - P - BAN1 - Formatted 9-October-24 21-26-38
626 pages
Australia - SIG - BAN1 - Formatted 9-October-24 21-17-32
No ratings yet
Australia - SIG - BAN1 - Formatted 9-October-24 21-17-32
1,096 pages
APAC - SIG - BAN1 - Formatted 9-October-24 21-05-56
No ratings yet
APAC - SIG - BAN1 - Formatted 9-October-24 21-05-56
953 pages
Japan - SIG - BAN1 - Formatted 9-October-24 21-29-24
No ratings yet
Japan - SIG - BAN1 - Formatted 9-October-24 21-29-24
1,121 pages
Singapore - SIG - BAN1 - Formatted 9-October-24 21-59-16
No ratings yet
Singapore - SIG - BAN1 - Formatted 9-October-24 21-59-16
1,101 pages
NewZealand - SIG - BAN2 - Formatted 9-October-24 21-43-05
No ratings yet
NewZealand - SIG - BAN2 - Formatted 9-October-24 21-43-05
1,817 pages
Malaysia - SIG - BAN1 - Formatted 9-October-24 21-34-58
No ratings yet
Malaysia - SIG - BAN1 - Formatted 9-October-24 21-34-58
1,111 pages
Indonesia - SIG - BAN1 - Formatted 9-October-24 21-23-02
No ratings yet
Indonesia - SIG - BAN1 - Formatted 9-October-24 21-23-02
1,116 pages
NewZealand - SIG - BAN1 - Formatted 9-October-24 21-41-16
No ratings yet
NewZealand - SIG - BAN1 - Formatted 9-October-24 21-41-16
1,091 pages
SEA - SIG - BAN1 - Formatted 9-October-24 21-53-36
No ratings yet
SEA - SIG - BAN1 - Formatted 9-October-24 21-53-36
933 pages
Japan - P - BAN2 - Formatted 9-October-24 21-27-59
No ratings yet
Japan - P - BAN2 - Formatted 9-October-24 21-27-59
1,001 pages
APAC - P - BAN2 - Formatted 9-October-24 21-04-29
No ratings yet
APAC - P - BAN2 - Formatted 9-October-24 21-04-29
883 pages
01 Machine Learning Basics
No ratings yet
01 Machine Learning Basics
51 pages
Credit Card Project Review
No ratings yet
Credit Card Project Review
59 pages
Semantic Segmentation Evaluation
No ratings yet
Semantic Segmentation Evaluation
14 pages
AI Practical TYCS
No ratings yet
AI Practical TYCS
31 pages
Machine Learning in Pattern Recognition
No ratings yet
Machine Learning in Pattern Recognition
6 pages
Bachelor of Technology: Diabetes Disease Prediction Using Machine Learning
No ratings yet
Bachelor of Technology: Diabetes Disease Prediction Using Machine Learning
58 pages
Machine Learning Model To Predict Birth
No ratings yet
Machine Learning Model To Predict Birth
20 pages
6. DEEP UNIT 4
No ratings yet
6. DEEP UNIT 4
31 pages
Prediction of Air Quality Index Using Supervised Machine Learning
No ratings yet
Prediction of Air Quality Index Using Supervised Machine Learning
14 pages
Sat - 40.Pdf - Agricultural Product Price and Crop Cultivation Prediction Based On SMLT
No ratings yet
Sat - 40.Pdf - Agricultural Product Price and Crop Cultivation Prediction Based On SMLT
11 pages
Artificial Intelligence and Machine Learning For Healthcare
No ratings yet
Artificial Intelligence and Machine Learning For Healthcare
13 pages
Learning Surface Ozone From Satellite Columns LESO A Regional Daily Estimation Framework For Surface Ozone Monitoring in China
No ratings yet
Learning Surface Ozone From Satellite Columns LESO A Regional Daily Estimation Framework For Surface Ozone Monitoring in China
11 pages
1 s2.0 S0306261924004148 Main
No ratings yet
1 s2.0 S0306261924004148 Main
20 pages
Software Reliability Prediction Using Machine Learning and Deep Learning
No ratings yet
Software Reliability Prediction Using Machine Learning and Deep Learning
6 pages
CBSE Board Exam Marking Scheme 2024
No ratings yet
CBSE Board Exam Marking Scheme 2024
6 pages
AiMidterm Exam - Attempt Review
No ratings yet
AiMidterm Exam - Attempt Review
17 pages
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
No ratings yet
Chapter4 (The Evaluating Multiple Models Chapter Is Really Good!)
47 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
OPCNN-FAKE Optimized Convolutional Neural Network For Fake News Detection
No ratings yet
OPCNN-FAKE Optimized Convolutional Neural Network For Fake News Detection
19 pages
Offline Signature Verification
No ratings yet
Offline Signature Verification
13 pages
Semi-: Supervised Learning
No ratings yet
Semi-: Supervised Learning
40 pages
MR - Tad Report
No ratings yet
MR - Tad Report
9 pages
ANN-unit 3
No ratings yet
ANN-unit 3
30 pages
Application of Artificial Intelligence Techniques to Predict Strip Foundation Capacity Near Slope Surfaces
No ratings yet
Application of Artificial Intelligence Techniques to Predict Strip Foundation Capacity Near Slope Surfaces
24 pages
R18 B ML LAB Manual - Minor Degree
No ratings yet
R18 B ML LAB Manual - Minor Degree
16 pages
Exploring DNN Robustness Against Adversarial Attacks Using Approximate Multipliers
No ratings yet
Exploring DNN Robustness Against Adversarial Attacks Using Approximate Multipliers
4 pages
Predicting energy consumption in multiple buildings using machine
No ratings yet
Predicting energy consumption in multiple buildings using machine
15 pages
تقرير مجموعه البيانات
No ratings yet
تقرير مجموعه البيانات
4 pages
Machine Learning-Assignments PDF
No ratings yet
Machine Learning-Assignments PDF
2 pages