0% found this document useful (0 votes)
6 views4 pages

r1

The document presents the R package 'pared' designed for model selection using multi-objective optimization, addressing challenges in balancing model fit, sparsity, and interpretability. It utilizes Gaussian process-based optimization to identify Pareto-optimal solutions across various statistical models, including elastic net and graphical lasso. The package provides interactive graphics for users to explore trade-offs and select hyperparameter values that yield desirable model characteristics.

Uploaded by

positive4562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

r1

The document presents the R package 'pared' designed for model selection using multi-objective optimization, addressing challenges in balancing model fit, sparsity, and interpretability. It utilizes Gaussian process-based optimization to identify Pareto-optimal solutions across various statistical models, including elastic net and graphical lasso. The package provides interactive graphics for users to explore trade-offs and select hyperparameter values that yield desirable model characteristics.

Uploaded by

positive4562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Bioinformatics, 2025, pp.

1–4

doi: DOI HERE


Advance Access Publication Date: Day Month Year
Application Note

APPLICATION NOTE

pared: Model selection using multi-objective


optimization
arXiv:2505.21730v1 [stat.ME] 27 May 2025

Priyam Das ,1,∗ Sarah Robinson2 and Christine B. Peterson 3

1
Department of Biostatistics, Virgina Commonwealth University, 2 Department of Statistics, Rice University and 3 Department of
Biostatistics, The University of Texas MD Anderson Cancer Center

Corresponding author. [email protected]
FOR PUBLISHER ONLY Received on Date Month 2025; revised on Date Month 2025; accepted on Date Month 2025

Abstract
Motivation: Model selection is a ubiquitous challenge in statistics. For penalized models, model selection typically
entails tuning hyperparameters to maximize a measure of fit or minimize out-of-sample prediction error. However, these
criteria fail to reflect other desirable characteristics, such as model sparsity, interpretability, or smoothness.
Results: We present the R package pared to enable the use of multi-objective optimization for model selection. Our
approach entails the use of Gaussian process-based optimization to efficiently identify solutions that represent desirable
trade-offs. Our implementation includes popular models with multiple objectives including the elastic net, fused lasso,
fused graphical lasso, and group graphical lasso. Our R package generates interactive graphics that allow the user to
identify hyperparameter values that result in fitted models which lie on the Pareto frontier.
Availability: We provide the R package pared and vignettes illustrating its application to both simulated and real data
at https://github.com/priyamdas2/pared.

1. Introduction As an alternative, fit criteria may be applied for model


selection. Classical measures of fit include the information
Model selection is a classic and well-studied statistical problem.
criteria, most notably AIC (Akaike, 1974) and BIC (Schwarz,
It is a non-trivial challenge to select a model that balances the
1978). These may be applied to penalized regression models,
trade-off of model parsimony, fit, and generalizability. To paint
but lack theoretical support for the p > n setting and tend to
a picture of this challenge in the high-dimensional setting, we
select overly dense models (Zou et al., 2007). Another vein of
begin by considering the lasso (Tibshirani, 1996), as a classic
research relies on stability under resampling as a criteria for
penalized regression approach with a single penalty parameter.
model selection (Meinshausen and Bühlmann, 2010; Liu et al.,
The choice of this penalty parameter, typically denoted λ,
2010). However, this approach inherits limitations of the base
is critical, as the value of λ determines both the estimated
learner: for example, the lasso tends to select only one from
coefficient values as well as the model sparsity. Typically, λ
among a set of correlated features. This may result in low
is chosen using cross-validation, which utilizes data-splitting to
stability for correlated predictors with lasso as the base model.
estimate prediction error on a held-out test set (Wu and Wang,
The problem becomes even more challenging in models that
2020). Given a grid of λ options, the optimal value is selected
require the choice of multiple hyperparameters. In particular,
as the one that minimizes the out-of-sample prediction error.
regression models that seek to address limitations of the lasso
Although simple, parameter selection via cross-validation
may incorporate more complex penalty structures with multiple
has several key drawbacks. Firstly, it can lead to instability,
tuning parameters. For example, the elastic net (Zou et al.,
as different random splits of the data will lead to different
2007) and fused lasso (Tibshirani et al., 2005), which target
selected values of the parameters (Roberts and Nowak, 2014).
smoothness or grouping of the coefficients in addition to
Secondly, in practice, it may result in models that include
sparsity, require the choice of two tuning parameters. In the
more features than desirable, as the optimization is focused
graphical modeling framework, the graphical lasso relies on a
around prediction accuracy rather than sparsity. Finally, it
single ℓ1 penalty to achieve sparsity in the precision matrix
may be time-consuming to re-fit the model for all the values
(Friedman et al., 2008). Methods designed for the inference
on the grid. This may be particularly inefficient if the grid
of multiple networks, including the fused and group graphical
contains many values that are far away from the optimum. As
lasso (Danaher et al., 2014), incorporate an additional penalty
an additional challenge, prediction error on a test set is not
term to encourage sharing of common edges or similar edge
a sensible target for unsupervised learning problems such as
values across multiple networks.
clustering or graphical model inference (Li et al., 2013b).

© The Author 2025. Published by Oxford University Press. All rights reserved. For permissions, please e-mail:
[email protected]
1
2 Das et al.

Critically, these methods seek to achieve multiple desired Joint graphical lasso. The joint graphical lasso (JGL)
objectives, including sparsity and some version of smoothness (Danaher et al., 2014) seeks to estimates multiple graphical
or similarity. Relying on a single measure of model fit for model models, achieving overall sparsity through an ℓ1 penalty on the
selection neglects these desiderata. Pareto optimization, also precision matrix for each group Θ(k) and encouraging sharing
known as multi-objective optimization, seeks to identify a set of across groups through a penalty on cross-group differences. The
solutions that lie along the Pareto front. These points represent general form of the JGL is:
optimal trade-offs in the sense that improving any of the criteria X K 
 (k) (k) (k) 
will necessarily worsen another of the targets. {Θ̂} ≡ argmax nk log det(Θ )−tr(S Θ ) −P ({Θ̂}) ,
{Θ̂} k=1
Here, we introduce the pared R package for Pareto-optimal
where K is the number of sample groups, nk is the sample size
decision making in model selection. The name pared implies
for the kth group, S(k) is the empirical covariance matrix for
that our approach enables the user to pare or trim away model
the kth group, and P is a penalty function.
complexity to reach a solution that represents a reasonable
There are two variants of this model: the fused and the
trade-off of fit, sparsity, and smoothness.
group graphical lasso, defined by the choice of penalty function
P . The fused graphical lasso combines a within-group ℓ1
2. Methods penalty with a fused lasso penalty designed to encourage similar
edge values across groups:
Our proposed method is particularly suitable for models with K X
(k) (k′ )
X X X (k)
multiple hyperparameters and multiple objectives. P ({Θ}) = λ1 |θij | + λ2 |θij − θij |.
k=1 i̸=j k<k′ i,j

2.1. Statistical models The group graphical lasso combines a within-group ℓ1 penalty
We provide an implementation of our proposed approach with a group lasso penalty designed to encourage overlapping
for popular models including the elastic net, fused lasso, edge selection across groups:
K X K
(k) 2 1/2
X (k)
X X 
fused graphical lasso, and group graphical lasso. Here, we P ({Θ}) = λ1 |θij | + λ2 (θij ) .
briefly describe these methods, their hyperparameters, and the k=1 i̸=j i̸=j k=1
competing objectives that we aim to minimize.
In both the fused and group graphical lasso, λ1 controls model
Elastic net. The elastic net (Zou and Hastie, 2005) combines ℓ1 sparsity while λ2 drives cross-group similarity.
and ℓ2 penalties to allow for both selection and regularization of For the fused graphical lasso, we consider the following three
the model coefficients. One attractive property of the elastic net objectives:
is that correlated predictors are likely to be selected together,
• Fit: Akaike information criteria (AIC)
whereas the lasso tends to arbitrarily include one predictor from
• Sparsity: total number of selected edges across groups
among a correlated set. The elastic net can be written:
 i • Cross-group similarity: mean absolute difference of precision
1 2
h
2
β̂ ≡ argmin ∥y − Xβ∥ + λ α∥β∥1 + (1 − α)∥β∥ . matrices across groups
β 2n
Varying values of α capture the spectrum of models from lasso For the group graphical lasso, we consider an overlapping set
(α = 0) to ridge (α = 1), while λ controls overall regularization. of criteria:
The two elastic net hyperparameters are typically selected
• Fit: Akaike information criteria (AIC)
using cross-validation on a grid. However, this approach has
• Sparsity: total number of selected edges across groups
several drawbacks as outlined in Section 1. In addition, recent
• Cross-group similarity: number of shared edges
work on elastic net points out that the marginal likelihood has
a flat optimal region across many (α, λ) combinations, which Our R package enables users to visualize trade-offs
means that additional criteria besides fit need to be considered across these metrics and identify parameter combinations
in model selection (van Nee et al., 2023). In the pared R that yield desirable compromises using interactive graphics.
package, we consider the following objectives: In the accompanying GitHub vignette https://github.com/
priyamdas2/pared, we provide example use cases for the models
• Fit: deviance
described above.
• Sparsity: number of non-zero coefficients
• Shrinkage: ℓ2 norm of the coefficients

Here, we consider increased shrinkage (smaller ∥β∥2 ) as a 2.2. Optimization approach


metric that reflects a reduced risk of overfitting.
Various algorithms have been proposed for hyperparameter
Fused lasso. The fused lasso (Tibshirani et al., 2005) includes optimization in the machine learning literature, including
an ℓ1 penalty to induce sparsity and a penalty on the successive Bayesian optimization (Snoek et al., 2012), evolutionary
differences of the model coefficients to encourage smoothness algorithms (Karl et al., 2023), and particle swarms (Xue et al.,
across ordered variables. The fused lasso model is as follows: 2012). Here, we seek to bridge this literature to provide

1 Xp  an easy-to-use model selection tool for popular regression
2
β̂ ≡ argmin ∥y − Xβ∥ + λ1 ∥β∥1 + λ2 |βj − βj−1 | . and graphical models in R. To do so, we rely on the
β 2n j=2
R package GPareto, which provides tools for solving multi-
The hyperparameters λ1 and λ2 control overall sparsity and
objective optimization problems in settings where evaluating
sparsity in differences, respectively. We consider the following
the objective functions is computationally expensive. This
objectives in model selection:
optimization framework uses Gaussian process (GP) models to
• Fit: residual sum of squares emulate each objective and implements sequential optimization
• Sparsity: number of non-zero coefficients strategies to efficiently explore trade-offs between conflicting
• Roughness: mean absolute difference of consecutive betas objectives. More precisely, the package addresses models with
pared: Model selection using multi-objective optimization 3

Fig. 1. Estimated precision matrices obtained using the group graphical lasso for ovarian (OV), uterine corpus endometrial carcinoma (UCEC),
and uterine carcinosarcoma (UCS) for the breast reactive, cell cycle, hormone receptor, and hormone signaling breast pathways. (a) shows the network
corresponding to the model with minimum AIC. (b) displays a sparser network obtained using pared, representing one of the Pareto-optimal solutions.

multiple outputs, y (1) (x), . . . , y (q) (x), where each y (i) : X ⊂ endometrial carcinoma (UCEC), and uterine carcinosarcoma
Rd → R is simultaneously optimized over the domain X . (UCS) (Berger et al., 2018). Specifically, the pathways breast
Because objectives are often conflicting (e.g., quality versus reactive, cell cycle, hormone receptor, and hormone signaling
cost), no single solution minimizes all objectives at once. breast were found to be substantially activated in these cancer
Instead, the goal is to approximate the Pareto set—the set of types (Das et al., 2020). To leverage the potential structural
non-dominated solutions where no objective can be improved similarity of signaling networks across these cancer types, we
without worsening at least one other (Collette and Siarry, applied the group graphical lasso to simultaneously estimate
2003). The image of this set in the objective space forms the proteomic networks of OV, UCEC, and UCS using a set
the Pareto front, which helps users identify well-balanced of 20 proteins associated with the aforementioned pathways,
trade-offs. Our proposed R package, pared, enables users with sample sizes of 428, 404, and 48, respectively. To
to construct such Pareto fronts by balancing context-specific illustrate the standard model selection approach, we first
objectives beyond a single criterion, for popular statistical applied the AIC, as recommended by Danaher et al. (2014).
models designed to address conflicting objectives. Computation The corresponding estimated precision matrices characterizing
times are provided in Table S1 of the Supplementary Material. the proteomic networks are shown in Figure 1(a).
Subsequently, we derived the Pareto-optimal set of models
that balance AIC, sparsity, and the number of shared edges
across networks using pared, with a run time of 2.4 minutes.
3. Illustrative analysis Figure 2 presents a screenshot of the interactive Pareto-front
In this section, we illustrate the utility of our approach in plot generated by pared, displaying the optimal set of solution
selecting a parsimonious and smooth set of protein-signaling points obtained using the group graphical lasso. Hovering the
networks across gynecological cancer types. To conduct this mouse cursor over any point reveals the corresponding model
analysis, we obtained normalized protein abundance data details, including tuning parameter values and model selection
from The Cancer Proteome Atlas (TCPA, Li et al., 2013a), metrics. Figure 1(b) illustrates one such Pareto-optimal model,
which quantifies protein markers using antibodies targeting key which is sparser than the model selected solely based on AIC.
oncogenic and cellular signaling pathways. Recent studies have This demonstrates the potential of our approach to identify
highlighted that similar pathway activities are observed across alternative optimal models that are more sparse, offering
the gynecological cancer types ovarian (OV), uterine corpus greater potential for network reliability and interpretability.
4 Das et al.

NIH/NCI R01 CA244845 and NIH/NCI CCSG P30CA016672


(Biostatistics Resource Group).

References
Akaike, H. (1974). A new look at the statistical model
identification. IEEE T Automat Contr, 19(6):716–723.
Berger, A., A., K., et al. (2018). A comprehensive pan-cancer
molecular study of gynecologic and breast cancers. Cancer
Cell, 33(4):690–705.
Collette, Y. and Siarry, P. (2003). Multiobjective
Optimization: Principles and Case Studies. Springer-
Verlag, Berlin, Heidelberg.
Danaher, P., Wang, P., and Witten, D. M. (2014). The
joint graphical lasso for inverse covariance estimation across
multiple classes. J Roy Stat Soc B, 76(2):373–397.
Das, P. et al. (2020). NExUS: Bayesian simultaneous network
estimation across unequal sample sizes. Bioinformatics,
36(3):798–804.
Fig. 2. Interactive plot generated using the pared R package, displaying
the set of all Pareto-optimal networks obtained by fitting the group
Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse
graphical lasso to the gynecological cancer proteomic dataset. Hovering inverse covariance estimation with the graphical lasso.
over any point reveals the corresponding tuning parameter values and the Biostatistics, 9(3):432–441.
values of the multi-objective criteria at that solution. Karl, F., Pielok, T., Moosbauer, J., Pfisterer, F., Coors, S.,
et al. (2023). Multi-objective hyperparameter optimization
in machine learning—an overview. ACM Trans Evol Learn
4. Conclusion Optim, 3(4):1–50.
Li, J. et al. (2013a). TCPA: a resource for cancer functional
In this work, we introduced pared, a flexible and extensible
proteomics data. Nat Methods, 10:1046–1047.
R package for model selection through multi-objective
Li, S., Hsu, L., Peng, J., and Wang, P. (2013b). Bootstrap
optimization. Unlike traditional approaches that optimize a
inference for network construction with an application to a
single criterion—such as cross-validation error or information
breast cancer microarray study. Ann Appl Stat, 7(1):391.
criteria—pared enables identification of a set of Pareto-
Liu, H., Roeder, K., and Wasserman, L. (2010). Stability
optimal models that balance competing objectives, including
approach to regularization selection (StARS) for high
sparsity, interpretability, and structural similarity across
dimensional graphical models. Adv Neur In, 23.
groups. The package supports several widely used penalized
Meinshausen, N. and Bühlmann, P. (2010). Stability selection.
models, including the elastic net, fused lasso, fused graphical
J Roy Stat Soc B, 72(4):417–473.
lasso, and group graphical lasso, with model evaluation guided
Roberts, S. and Nowak, G. (2014). Stabilizing the lasso against
by context-appropriate metrics. Optimization is performed
cross-validation variability. Comput Stat Data An, 70:198–
using Gaussian process-based surrogate modeling, allowing for
211.
efficient and scalable exploration of the hyperparameter space.
Schwarz, G. (1978). Estimating the dimension of a model. Ann
We demonstrated the utility of pared in a case study
Stat, pages 461–464.
involving proteomics networks across gynecological cancers
Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical
represented in The Cancer Proteome Atlas. The method
Bayesian optimization of machine learning algorithms. Adv
identified a diverse set of high-quality models, some of which
Neur In, 25.
offered greater sparsity or structural coherence than those
Tibshirani, R. (1996). Regression shrinkage and selection via
selected by traditional single-objective criteria such as AIC. An
the lasso. J Roy Stat Soc B, 58(1):267–288.
interactive visualization tool further enhances interpretability
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight,
and supports user-guided model exploration.
K. (2005). Sparsity and smoothness via the fused lasso. J
Overall, pared provides a principled, reproducible, and
Roy Stat Soc B, 67(1):91–108.
interpretable framework for multi-objective model selection,
van Nee, M. M., van de Brug, T., and van de Wiel, M. A.
with broad relevance to statistical learning and computational
(2023). Fast marginal likelihood estimation of penalties for
biology. Future work may extend the package to incorporate
group-adaptive elastic net. J Comput Graph Stat, 32(3):950–
additional model classes and optimization strategies, further
960.
expanding its applicability.
Wu, Y. and Wang, L. (2020). A survey of tuning parameter
selection for high-dimensional regression. Annu Rev Stat
Additional information Appl, 7(1):209–226.
Xue, B., Zhang, M., and Browne, W. N. (2012). Particle swarm
Funding optimization for feature selection in classification: A multi-
PD was partially supported by NIH/NCI Cancer Center objective approach. IEEE T Cybernetics, 43(6):1656–1671.
Support Grant P30 CA016059. SR was partially supported by Zou, H. and Hastie, T. (2005). Regularization and variable
NSF Graduate Research Fellowship DGE 1842494 and NIH selection via the elastic net. J Roy Stat Soc B, 67(2):301–320.
T32 Grant CA96520-13: Training program in Biostatistics Zou, H., Hastie, T., and Tibshirani, R. (2007). On the “degrees
for Cancer Research. CBP was partially supported by of freedom” of the lasso. Ann Stat, 35(5):2173–2192.

You might also like