r1
r1
1–4
APPLICATION NOTE
1
Department of Biostatistics, Virgina Commonwealth University, 2 Department of Statistics, Rice University and 3 Department of
Biostatistics, The University of Texas MD Anderson Cancer Center
∗
Corresponding author. [email protected]
FOR PUBLISHER ONLY Received on Date Month 2025; revised on Date Month 2025; accepted on Date Month 2025
Abstract
Motivation: Model selection is a ubiquitous challenge in statistics. For penalized models, model selection typically
entails tuning hyperparameters to maximize a measure of fit or minimize out-of-sample prediction error. However, these
criteria fail to reflect other desirable characteristics, such as model sparsity, interpretability, or smoothness.
Results: We present the R package pared to enable the use of multi-objective optimization for model selection. Our
approach entails the use of Gaussian process-based optimization to efficiently identify solutions that represent desirable
trade-offs. Our implementation includes popular models with multiple objectives including the elastic net, fused lasso,
fused graphical lasso, and group graphical lasso. Our R package generates interactive graphics that allow the user to
identify hyperparameter values that result in fitted models which lie on the Pareto frontier.
Availability: We provide the R package pared and vignettes illustrating its application to both simulated and real data
at https://github.com/priyamdas2/pared.
© The Author 2025. Published by Oxford University Press. All rights reserved. For permissions, please e-mail:
[email protected]
1
2 Das et al.
Critically, these methods seek to achieve multiple desired Joint graphical lasso. The joint graphical lasso (JGL)
objectives, including sparsity and some version of smoothness (Danaher et al., 2014) seeks to estimates multiple graphical
or similarity. Relying on a single measure of model fit for model models, achieving overall sparsity through an ℓ1 penalty on the
selection neglects these desiderata. Pareto optimization, also precision matrix for each group Θ(k) and encouraging sharing
known as multi-objective optimization, seeks to identify a set of across groups through a penalty on cross-group differences. The
solutions that lie along the Pareto front. These points represent general form of the JGL is:
optimal trade-offs in the sense that improving any of the criteria X K
(k) (k) (k)
will necessarily worsen another of the targets. {Θ̂} ≡ argmax nk log det(Θ )−tr(S Θ ) −P ({Θ̂}) ,
{Θ̂} k=1
Here, we introduce the pared R package for Pareto-optimal
where K is the number of sample groups, nk is the sample size
decision making in model selection. The name pared implies
for the kth group, S(k) is the empirical covariance matrix for
that our approach enables the user to pare or trim away model
the kth group, and P is a penalty function.
complexity to reach a solution that represents a reasonable
There are two variants of this model: the fused and the
trade-off of fit, sparsity, and smoothness.
group graphical lasso, defined by the choice of penalty function
P . The fused graphical lasso combines a within-group ℓ1
2. Methods penalty with a fused lasso penalty designed to encourage similar
edge values across groups:
Our proposed method is particularly suitable for models with K X
(k) (k′ )
X X X (k)
multiple hyperparameters and multiple objectives. P ({Θ}) = λ1 |θij | + λ2 |θij − θij |.
k=1 i̸=j k<k′ i,j
2.1. Statistical models The group graphical lasso combines a within-group ℓ1 penalty
We provide an implementation of our proposed approach with a group lasso penalty designed to encourage overlapping
for popular models including the elastic net, fused lasso, edge selection across groups:
K X K
(k) 2 1/2
X (k)
X X
fused graphical lasso, and group graphical lasso. Here, we P ({Θ}) = λ1 |θij | + λ2 (θij ) .
briefly describe these methods, their hyperparameters, and the k=1 i̸=j i̸=j k=1
competing objectives that we aim to minimize.
In both the fused and group graphical lasso, λ1 controls model
Elastic net. The elastic net (Zou and Hastie, 2005) combines ℓ1 sparsity while λ2 drives cross-group similarity.
and ℓ2 penalties to allow for both selection and regularization of For the fused graphical lasso, we consider the following three
the model coefficients. One attractive property of the elastic net objectives:
is that correlated predictors are likely to be selected together,
• Fit: Akaike information criteria (AIC)
whereas the lasso tends to arbitrarily include one predictor from
• Sparsity: total number of selected edges across groups
among a correlated set. The elastic net can be written:
i • Cross-group similarity: mean absolute difference of precision
1 2
h
2
β̂ ≡ argmin ∥y − Xβ∥ + λ α∥β∥1 + (1 − α)∥β∥ . matrices across groups
β 2n
Varying values of α capture the spectrum of models from lasso For the group graphical lasso, we consider an overlapping set
(α = 0) to ridge (α = 1), while λ controls overall regularization. of criteria:
The two elastic net hyperparameters are typically selected
• Fit: Akaike information criteria (AIC)
using cross-validation on a grid. However, this approach has
• Sparsity: total number of selected edges across groups
several drawbacks as outlined in Section 1. In addition, recent
• Cross-group similarity: number of shared edges
work on elastic net points out that the marginal likelihood has
a flat optimal region across many (α, λ) combinations, which Our R package enables users to visualize trade-offs
means that additional criteria besides fit need to be considered across these metrics and identify parameter combinations
in model selection (van Nee et al., 2023). In the pared R that yield desirable compromises using interactive graphics.
package, we consider the following objectives: In the accompanying GitHub vignette https://github.com/
priyamdas2/pared, we provide example use cases for the models
• Fit: deviance
described above.
• Sparsity: number of non-zero coefficients
• Shrinkage: ℓ2 norm of the coefficients
Fig. 1. Estimated precision matrices obtained using the group graphical lasso for ovarian (OV), uterine corpus endometrial carcinoma (UCEC),
and uterine carcinosarcoma (UCS) for the breast reactive, cell cycle, hormone receptor, and hormone signaling breast pathways. (a) shows the network
corresponding to the model with minimum AIC. (b) displays a sparser network obtained using pared, representing one of the Pareto-optimal solutions.
multiple outputs, y (1) (x), . . . , y (q) (x), where each y (i) : X ⊂ endometrial carcinoma (UCEC), and uterine carcinosarcoma
Rd → R is simultaneously optimized over the domain X . (UCS) (Berger et al., 2018). Specifically, the pathways breast
Because objectives are often conflicting (e.g., quality versus reactive, cell cycle, hormone receptor, and hormone signaling
cost), no single solution minimizes all objectives at once. breast were found to be substantially activated in these cancer
Instead, the goal is to approximate the Pareto set—the set of types (Das et al., 2020). To leverage the potential structural
non-dominated solutions where no objective can be improved similarity of signaling networks across these cancer types, we
without worsening at least one other (Collette and Siarry, applied the group graphical lasso to simultaneously estimate
2003). The image of this set in the objective space forms the proteomic networks of OV, UCEC, and UCS using a set
the Pareto front, which helps users identify well-balanced of 20 proteins associated with the aforementioned pathways,
trade-offs. Our proposed R package, pared, enables users with sample sizes of 428, 404, and 48, respectively. To
to construct such Pareto fronts by balancing context-specific illustrate the standard model selection approach, we first
objectives beyond a single criterion, for popular statistical applied the AIC, as recommended by Danaher et al. (2014).
models designed to address conflicting objectives. Computation The corresponding estimated precision matrices characterizing
times are provided in Table S1 of the Supplementary Material. the proteomic networks are shown in Figure 1(a).
Subsequently, we derived the Pareto-optimal set of models
that balance AIC, sparsity, and the number of shared edges
across networks using pared, with a run time of 2.4 minutes.
3. Illustrative analysis Figure 2 presents a screenshot of the interactive Pareto-front
In this section, we illustrate the utility of our approach in plot generated by pared, displaying the optimal set of solution
selecting a parsimonious and smooth set of protein-signaling points obtained using the group graphical lasso. Hovering the
networks across gynecological cancer types. To conduct this mouse cursor over any point reveals the corresponding model
analysis, we obtained normalized protein abundance data details, including tuning parameter values and model selection
from The Cancer Proteome Atlas (TCPA, Li et al., 2013a), metrics. Figure 1(b) illustrates one such Pareto-optimal model,
which quantifies protein markers using antibodies targeting key which is sparser than the model selected solely based on AIC.
oncogenic and cellular signaling pathways. Recent studies have This demonstrates the potential of our approach to identify
highlighted that similar pathway activities are observed across alternative optimal models that are more sparse, offering
the gynecological cancer types ovarian (OV), uterine corpus greater potential for network reliability and interpretability.
4 Das et al.
References
Akaike, H. (1974). A new look at the statistical model
identification. IEEE T Automat Contr, 19(6):716–723.
Berger, A., A., K., et al. (2018). A comprehensive pan-cancer
molecular study of gynecologic and breast cancers. Cancer
Cell, 33(4):690–705.
Collette, Y. and Siarry, P. (2003). Multiobjective
Optimization: Principles and Case Studies. Springer-
Verlag, Berlin, Heidelberg.
Danaher, P., Wang, P., and Witten, D. M. (2014). The
joint graphical lasso for inverse covariance estimation across
multiple classes. J Roy Stat Soc B, 76(2):373–397.
Das, P. et al. (2020). NExUS: Bayesian simultaneous network
estimation across unequal sample sizes. Bioinformatics,
36(3):798–804.
Fig. 2. Interactive plot generated using the pared R package, displaying
the set of all Pareto-optimal networks obtained by fitting the group
Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse
graphical lasso to the gynecological cancer proteomic dataset. Hovering inverse covariance estimation with the graphical lasso.
over any point reveals the corresponding tuning parameter values and the Biostatistics, 9(3):432–441.
values of the multi-objective criteria at that solution. Karl, F., Pielok, T., Moosbauer, J., Pfisterer, F., Coors, S.,
et al. (2023). Multi-objective hyperparameter optimization
in machine learning—an overview. ACM Trans Evol Learn
4. Conclusion Optim, 3(4):1–50.
Li, J. et al. (2013a). TCPA: a resource for cancer functional
In this work, we introduced pared, a flexible and extensible
proteomics data. Nat Methods, 10:1046–1047.
R package for model selection through multi-objective
Li, S., Hsu, L., Peng, J., and Wang, P. (2013b). Bootstrap
optimization. Unlike traditional approaches that optimize a
inference for network construction with an application to a
single criterion—such as cross-validation error or information
breast cancer microarray study. Ann Appl Stat, 7(1):391.
criteria—pared enables identification of a set of Pareto-
Liu, H., Roeder, K., and Wasserman, L. (2010). Stability
optimal models that balance competing objectives, including
approach to regularization selection (StARS) for high
sparsity, interpretability, and structural similarity across
dimensional graphical models. Adv Neur In, 23.
groups. The package supports several widely used penalized
Meinshausen, N. and Bühlmann, P. (2010). Stability selection.
models, including the elastic net, fused lasso, fused graphical
J Roy Stat Soc B, 72(4):417–473.
lasso, and group graphical lasso, with model evaluation guided
Roberts, S. and Nowak, G. (2014). Stabilizing the lasso against
by context-appropriate metrics. Optimization is performed
cross-validation variability. Comput Stat Data An, 70:198–
using Gaussian process-based surrogate modeling, allowing for
211.
efficient and scalable exploration of the hyperparameter space.
Schwarz, G. (1978). Estimating the dimension of a model. Ann
We demonstrated the utility of pared in a case study
Stat, pages 461–464.
involving proteomics networks across gynecological cancers
Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical
represented in The Cancer Proteome Atlas. The method
Bayesian optimization of machine learning algorithms. Adv
identified a diverse set of high-quality models, some of which
Neur In, 25.
offered greater sparsity or structural coherence than those
Tibshirani, R. (1996). Regression shrinkage and selection via
selected by traditional single-objective criteria such as AIC. An
the lasso. J Roy Stat Soc B, 58(1):267–288.
interactive visualization tool further enhances interpretability
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight,
and supports user-guided model exploration.
K. (2005). Sparsity and smoothness via the fused lasso. J
Overall, pared provides a principled, reproducible, and
Roy Stat Soc B, 67(1):91–108.
interpretable framework for multi-objective model selection,
van Nee, M. M., van de Brug, T., and van de Wiel, M. A.
with broad relevance to statistical learning and computational
(2023). Fast marginal likelihood estimation of penalties for
biology. Future work may extend the package to incorporate
group-adaptive elastic net. J Comput Graph Stat, 32(3):950–
additional model classes and optimization strategies, further
960.
expanding its applicability.
Wu, Y. and Wang, L. (2020). A survey of tuning parameter
selection for high-dimensional regression. Annu Rev Stat
Additional information Appl, 7(1):209–226.
Xue, B., Zhang, M., and Browne, W. N. (2012). Particle swarm
Funding optimization for feature selection in classification: A multi-
PD was partially supported by NIH/NCI Cancer Center objective approach. IEEE T Cybernetics, 43(6):1656–1671.
Support Grant P30 CA016059. SR was partially supported by Zou, H. and Hastie, T. (2005). Regularization and variable
NSF Graduate Research Fellowship DGE 1842494 and NIH selection via the elastic net. J Roy Stat Soc B, 67(2):301–320.
T32 Grant CA96520-13: Training program in Biostatistics Zou, H., Hastie, T., and Tibshirani, R. (2007). On the “degrees
for Cancer Research. CBP was partially supported by of freedom” of the lasso. Ann Stat, 35(5):2173–2192.