Improved LARS Algorithm For Adaptive LASSO in The Linearregression Model1
Improved LARS Algorithm For Adaptive LASSO in The Linearregression Model1
regression model
Abstract
The adaptive LASSO has been used for consistent variable selection in place of LASSO in the linear re-
gression model. In this article, we propose a modified LARS algorithm to combine adaptive LASSO with some
biased estimators, namely the Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased
Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator, and r-d class
estimator. Furthermore, we examine the performance of the proposed algorithm using a Monte Carlo simulation
study and real-world examples.
Keywords: Adaptive LASSO, LARS, Biased estimators, Monte Carlo simulation.
1. Introduction
Let us consider a linear regression model
y = Xβ + ε. (1.1)
Here, y represents the n × 1 vector of observations on the dependent variable, X is the n × p matrix
of observations on the non-stochastic predictor variables, β stands for a p × 1 vector of unknown
coefficients, and ε denotes the n × 1 vector of random error terms. These errors are assumed to be
independent and identically normally distributed with mean zero and common variance σ 2 .
It is widely acknowledged that the Ordinary Least Squares Estimator (OLSE) serves as the Best
Linear Unbiased Estimator (BLUE) for determining the unknown parameter vector in model (1.1),
expressed as:
Nevertheless, the OLSE demonstrates instability and yields parameter estimates with high vari-
ance in the presence of multicollinearity within X . To mitigate this multicollinearity issue, many
researchers resort to biased estimators.
As per Kayanan and Wijekoon (2017), the generalized representation of biased estimators includ-
ing Ridge Estimator (RE), Almost Unbiased RidgeEstimator (AURE), Liu Estimator (LE), Almost
1 Corresponding author: Department of Physical Science, University of Vavuniya, Vavuniya, Sri Lanka. E-mail:
[email protected]
2
Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class esti-
mator and r-d class estimator can be expressed as:
βˆ G = G βˆ OLSE (1.3)
where
ˆ
β RE
X ′ X + kII )−1 X ′ X
if G = (X
βˆ AURE if G = I − k2 (X X ′ X + kII )−2
βˆ LE
X ′ X + I )−1 (X
if G = (X X ′ X + dII )
ˆ
β G = βˆ AULE if G = I − (1 − d) (X 2 X ′ X + I )−2
ˆ
if G = T h T ′h
β PCRE
βˆ if G = T h T ′h (X
X ′ X + kII )−1 X ′ X
ˆ rk
′ X′ X ′ X + dII )
β rd if G = T h T h (X X + I )−1 (X
Kayanan and Wijekoon (2017) demonstrated that the r-k class estimator and r-d class estimator
outperform other estimators within a specified range of regularization parameter values when multi-
collinearity exists among the predictor variables. However, biased estimators can introduce substantial
bias when the number of predictor variables is high, potentially leading to the inclusion of irrelevant
predictor variables in the final model. To address this issue, Tibshirani (1996) proposed the Least
Absolute Shrinkage and Selection Operator (LASSO) as
p
βˆ LASSO = arg min (yy − X β )′ (yy − X β ) subject to
β
∑ |β j | ≤ t, (1.4)
j=1
where t ≥ 0 is a turning parameter. The LASSO solutions has been obtained by the Least Angle
Regression (LARS) algorithm.
According to Zou and Hastie (2005), LASSO failed to outperform Ridge Estimator if high multi-
collinearity exists among predictors, and it is unsteady when the number of predictors is higher than
the number of observations. To overcome this problem, Zou and Hastie (2005) proposed Elastic Net
(ENet) estimator by combining LASSO and RE as
( )
p p
ˆ ′ 2
β Enet = arg min (yy − X β ) (yy − X β ) + k ∑ β j subject to
β
∑ |β j | ≤ t. (1.5)
j=1 j=1
The LARS-EN algorithm, which is a modified version of the LARS-LASSO algorithm, has been used
to obtain solutions for ENet.
Further, Zou and Hastie (2005) noted that LASSO does not care about variable importance when
a group of variables among which the pairwise correlations are very high.
To handle this problem, Zou (2006) proposed adaptive LASSO by giving different weights to
regression coefficients in L1 penalty of LASSO. By taking weight vector ŵw = |βˆ OLSE |−α for any
α > 0, the adaptive LASSO is defined as
p
βˆ ad pLASSO = arg min (yy − X β )′ (yy − X β ) subject to
β
∑ |w j β j | ≤ t. (1.6)
j=1
3
In addition to that Zou and Zhang (2009) proposed adaptive Enet estimator by combining adaptive
LASSO and RE, and it is defined as
( )
p p
βˆ ad pEnet = arg min (yy − X β )′ (yy − X β ) + k ∑ β j2 subject to ∑ |w j β j | ≤ t, (1.7)
β j=1 j=1
Kayanan and Wijekoon (2020) proposed the generalized version of LARS (GLARS) algorithm
that combines LASSO and biased estimators such as RE, AURE, LE, AULE, PCRE,r-k class estimator
and r-d class estimator. Further, they have shown that the combination of LASSO and r-d class
estimator performed well in the high dimensional linear regression model when high multicollinearity
exits among the predictor variables.
In this article, we propose improved version of GLARS algorithm that can be combined adap-
tive LASSO with other biased estimators such as AURE, LE, AULE, PCRE, r-k class and r-d class
estimators. Further, we compared the prediction performance of the proposed algorithm with exist-
ing algorithms of adaptive LASSO and adaptive ENet using a Monte-Carlo simulation study and a
real-world example.
The rest of the article is organized as follows: Section 2 presents the proposed adaptive GLARS
algorithm, Section 3 evaluates the performance of the proposed algorithm, and Section 4 concludes
the article.
Based on the methodology outlined by Kayanan and Wijekoon (2020), we propose the adaptive
GLARS algorithm as follows:
4
βˆ ji = βˆ j(i−1) + ρi u i , (2.1)
where αi is a value between 0 and 1 representing the distance the estimate moves before
another variable enters the model, and u i is the equiangular vector.
• Calculate the direction u i using:
E ′i X ′ X E i )−1 E ′i X ′ r i−1 ,
u i = G E (E (2.2)
where Ei is the matrix with columns (e j1 , e j2 , ..., e ji ), e j is the j-th standard unit vector in
R p with the indices of selected variables, and G E depends on the specific estimator which
can be substituted by respective expressions for any of estimators of our interest as listed in
Table 1.
• Update ρi as: n o
− ∗
ρi = min ρ +
ji , ρ ji , ρ ji ∈ [0, 1] (2.3)
where
Cor(rr i−1 , X ji ) ±Cor(rr i−1 , X j )
ρ±
ji = for any j such that βˆ j(i−1) = 0, (2.4)
Cor(rr i−1 , X ji ) ±Cor(XX ui, X j )
and
βˆ j(i−1)
ρ ∗ji = − for any j such that βˆ j(i−1) , 0. (2.5)
ui
• If ρi = ρ ∗ji , update E i by removing the column e j from E i−1 . Calculate the new residual r i
as:
r i = r i−1 − ρi X u i , (2.6)
−
and move to the next step where ji+1 is the value of j such that ρi = ρ + ∗
ji or ρi = ρ ji or ρ ji .
• End this step when ρi = 1.
βˆ
5: Output βˆ ad p = .
w
ŵ
5
where (yynew , X new ) denotes the new data which are not used to obtain the parameter estimates, and βˆ
is the estimated value of β using the respective algorithm. A Monte Carlo simulation study and the
real-world examples are used for the comparison.
where zi, j is an independent standard normal pseudo random number, and ρ is the theoretical correla-
tion between any two explanatory variables.
6
In this study, we have used a linear regression model of 100 observations and 20 predictors. A
dependent variable is generated by using the following equation
where εi is a normal pseudo random number with mean zero and common variance σ 2 . We choose
β = (β1 , β2 , ..., β20 ) as the normalized eigenvector corresponding to the largest eigenvalue of X ′ X for
which β ′ β = 1. To investigate the effects of different degrees of multicollinearity on the estimators,
we choose ρ = (0.5, 0.7, 0.9), which represents weak, moderated and high multicollinearity. For
the analysis, we have simulated 50 data sets consisting of 50 observations to fit the model and 50
observations to calculate the RMSE.
The Cross-validated RMSE of the adaptive GLARS algorithms are displayed in Fig. 1 - Fig. 3,
and the median cross-validated RMSE of the algorithms are displayed in Table 2 - Table 4.
1 adpLARS−LASSO
2 adpLARS−EN
3 adpLARS−AURE
4.5
4 adpLARS−LE
5 adpLARS−AULE
6 adpLARS−PCRE
7 adpLARS−rk
8 adpLARS−rd
4.0
RMSE
3.5
3.0
2.5
1 2 3 4 5 6 7 8
Algorithms
Figure 1: Cross-validated RMSE values of the adaptive GLARS algorithms when ρ = 0.5.
7
1 adpLARS−LASSO
2 adpLARS−EN
3 adpLARS−AURE
4.5
4 adpLARS−LE
5 adpLARS−AULE
6 adpLARS−PCRE
7 adpLARS−rk
8 adpLARS−rd
4.0
RMSE
3.5
3.0
2.5
1 2 3 4 5 6 7 8
Algorithms
Figure 2: Cross-validated RMSE values of the adaptive GLARS algorithms when ρ = 0.7.
5.5
5.0
1 adpLARS−LASSO
2 adpLARS−EN
4.5
3 adpLARS−AURE
4 adpLARS−LE
5 adpLARS−AULE
6 adpLARS−PCRE
RMSE
4.0
7 adpLARS−rk
8 adpLARS−rd
3.5
3.0
2.5
1 2 3 4 5 6 7 8
Algorithms
Figure 3: Cross-validated RMSE values of the adaptive GLARS algorithms when ρ = 0.9.
8
Table 2: Median Cross-validated RMSE values of the adaptive GLARS algorithms when ρ = 0.5.
Algorithms RMSE (k, d) α t Selected variables
adpLARS-LASSO 3.45489 – 1 6.6635 16
adpLARS-EN 3.41614 0.2 1 7.5795 17
adpLARS-AURE 3.44668 1.0 1 7.1685 17
adpLARS-LE 3.34648 0.3 1 7.1018 15
adpLARS-AULE 3.48312 0.2 1 8.0718 16
adpLARS-PCRE 3.31719 – 1 6.5019 16
adpLARS-rk 3.35712 0.2 1 6.0726 17
adpLARS-rd 3.47994 0.99 1 6.5019 16
Table 3: Median Cross-validated RMSE values of the adaptive GLARS algorithms when ρ = 0.7.
Algorithms RMSE (k, d) α t Selected variables
adpLARS-LASSO 3.53553 – 1 8.7067 16
adpLARS-EN 3.42320 0.3 0.5 8.9330 17
adpLARS-AURE 3.53440 0.7 1 8.0610 17
adpLARS-LE 3.45469 0.1 0.5 9.1520 15
adpLARS-AULE 3.56472 0.1 1 8.0821 16
adpLARS-PCRE 3.41530 – 1 9.5873 16
adpLARS-rk 3.35452 0.1 1 8.9412 16
adpLARS-rd 3.37755 0.2 1 8.8207 16
Table 4: Median Cross-validated RMSE values of the adaptive GLARS algorithms when ρ = 0.9.
Algorithms RMSE (k, d) α t Selected variables
adpLARS-LASSO 3.44950 – 0.5 4.0460 15
adpLARS-EN 3.39404 1.0 1 8.2710 17
adpLARS-AURE 3.50448 0.9 1 10.045 17
adpLARS-LE 3.49651 0.1 0.5 10.005 15
adpLARS-AULE 3.48735 0.1 0.5 8.0684 16
adpLARS-PCRE 3.49078 – 0.5 10.682 17
adpLARS-rk 3.42176 0.3 1 10.433 16
adpLARS-rd 3.37842 0.99 0.5 7.0576 15
Based on the insights gathered from Fig. 1 to Fig. 3 and Table 2 to Table 4, it is evident that the
adpLARS-PCRE, adpLARS-rk, and adpLARS-rd algorithms consistently demonstrate superior per-
formance in terms of RMSE criterion compared to other adaptive GLARS algorithms across varying
degrees of multicollinearity, from weak to moderate and high levels, respectively.
Table 5: Cross-validated RMSE values of Prostate Cancer Data using adaptive GLARS.
Algorithms RMSE (k, d) α t Selected variables
adpLARS-LASSO 0.77653 – 0.2 1.57112 7
adpLARS-EN 0.78716 0.3 1 0.80638 7
adpLARS-AURE 0.80638 1.0 0.9 0.80638 7
adpLARS-LE 0.80014 0.1 0.5 1.45884 7
adpLARS-AULE 0.79046 0.2 1 1.31322 6
adpLARS-PCRE 0.76890 – 0.9 1.44929 7
adpLARS-rk 0.77698 0.2 0.9 1.36273 7
adpLARS-rd 0.76854 0.7 0.9 1.44764 7
The cross-validated RMSE values obtained through the adaptive GLARS algorithms are sum-
marized in Table 5. Upon examining Table 5, it becomes evident that the adpLARS-rd algorithm
outperforms other algorithms when applied to the Prostate Cancer Data.
4. Conclusions
In conclusion, this study clearly shows that the adpLARS-rk and adpLARS-rd algorithms work well
for dealing with high dimensional linear regression problems, especially when there are many closely
related independent variables. These algorithms emerge as reliable tools for tackling high dimensional
regression models and offer promising avenues for future research and practical application in data-
driven environments.
Acknowledgement
The authors are grateful to the anonymous referee for a careful checking of the details and for helpful
comments that improved this paper.
References
Efron B, Hastie T, Johnstone I and Tibshirani R (2004). Least angle regression, The Annals of statis-
tics, 32(2), 407–499.
Kayanan M and Wijekoon P (2017). Performance of existing biased estimators and the respective
predictors in a misspecified linear regression model, Open Journal of Statistics, 7(5), 876–900.
10
Kayanan M and Wijekoon P (2020). Variable selection via biased estimators in the linear regression
model, Open Journal of Statistics, 10, 113–126.
McDonald GC and Galarneau DI (1975). A monte carlo evaluation of some ridge-type estimators,
Journal of the American Statistical Association, 70(350), 407–416.
Stamey TA, Kabalin JN, et al. (1989). Prostate specific antigen in the diagnosis and treatment of
adenocarcinoma of the prostate: Ii. radical prostatectomy treated patients, Journal of Urology,
141(5), 1076–1083.
Tibshirani R (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical
Society: Series B (Methodological), 58(1), 267–288.
Zou H (2006). The adaptive lasso and its oracle properties, Journal of the American Statistical Asso-
ciation, 101(476), 1418–1429.
Zou H and Hastie T (2005). Regularization and variable selection via the elastic net, Journal of the
Royal Statistical Society: Series B, 67(2), 301–320.
Zou H and Zhang HH (2009). On the adaptive elastic-net with a diverging number of parameters, The
Annals of Statistics, 37(4), 1733–1751