0% found this document useful (0 votes)

67 views

A Stepwise Approach For High-Dimensional Gaussian Graphical Models

This document proposes a stepwise approach for estimating high-dimensional Gaussian graphical models. It exploits the relationship between partial correlation coefficients and the distribution of prediction errors from linear regression models. The authors present a novel forward-backward selection algorithm for detecting conditionally dependent variable pairs. Simulation studies and applications to real data show their proposed algorithm outperforms existing methods like graphical lasso and CLIME across different performance measures and model settings.

Uploaded by

ivanmarce

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

A Stepwise Approach For High-Dimensional Gaussian Graphical Models

Uploaded by

ivanmarce

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

A Stepwise Approach for High-Dimensional

Gaussian Graphical Models

Ginette LAFIT, Francisco J. NOGALES, Marcelo RUIZ and Ruben H. ZAMAR
arXiv:1808.06016v1 [stat.ME] 17 Aug 2018

Abstract
We present a stepwise approach to estimate high dimensional Gaussian graphical
models . We exploit the relation between the partial correlation coefficients and
the distribution of the prediction errors, and parametrize the model in terms of the
Pearson correlation coefficients between the prediction errors of the nodes’ best linear
predictors. We propose a novel stepwise algorithm for detecting pairs of conditionally
dependent variables. We show that the proposed algorithm outperforms existing
methods such as the graphical lasso and CLIME in simulation studies and real life
applications. In our comparison we report different performance measures that look at
different desirable features of the recovered graph and consider several model settings.

Keywords: Covariance Selection; Gaussian Graphical Model; Forward and Backward Se-
lection; Partial Correlation Coefficient.

Ginette Lafit, Postdoctoral research fellow, Research Group of Quantitative Psychology and Individual Differences,
KU LeuvenUniversity of Leuven, Leuven, Belgium (E-mail: [email protected]), Francisco J. Nogales is Professor,
Department of Statistics and UC3M-BS Institute of Financial Big Data, Universidad Carlos III de Madrid, España (E-mail:
[email protected]), Ruben H. Zamar is Professor, Department of Statistics, University of British Columbia, 3182
Earth Sciences Building, 2207 Main Mall, Vancouver, BC V6T 1Z4, Canada (Email: [email protected]) and Marcelo Ruiz
is Professor, Departamento de Matemática, FCEFQyNat, Universidad Nacional de Rı́o Cuarto, Córdoba, Argentina (E-mail:
[email protected]).

1
1 Introduction
High-dimensional Gaussian graphical models (GGM) are widely used in practice to repre-
sent the linear dependency between variables. The underlying idea in GGM is to measure
linear dependencies by estimating partial correlations to infer whether there is an associ-
ation between a given pair of variables, conditionally on the remaining ones. Moreover,
there is a close relation between the nonzero partial correlation coefficients and the nonzero
entries in the inverse of the covariance matrix. Covariance selection procedures take ad-
vantage of this fact to estimate the GGM conditional dependence structure given a sample
(Dempster, 1972; Lauritzen, 1996; Edwards, 2000).
When the dimension p is larger than the number n of observations, the sample covariance
matrix S is not invertible and the maximum likehood estimate (MLE) of Σ does not exist.
When p/n ≤ 1, but close to 1, S is invertible but ill-conditioned, increasing the estimation
error (Ledoit and Wolf, 2004). To deal with this problem, several covariance selection
procedures have been proposed based on the assumption that the inverse of the covariance
matrix, Ω, called precision matrix, is sparse.
We present an approach to perform covariance selection in a high dimensional GGM
based on a forward-backward algorithm called graphical stepwise (GS). Our procedure
takes advantage of the relation between the partial correlation and the Pearson correlation
coefficient of the residuals.
Existing methods to estimate the GGM can be classified in three classes: nodewise
regression methods, maximum likelihood methods and limited order partial correlations
methods. The nodewise regression method was proposed by Meinshausen and Bühlmann
(2006). This method estimates a lasso regression for each node in the graph. See for
example Peng et al. (2009), Yuan (2010), Liu and Wang (2012), Zhou et al. (2011) and

2
Ren et al. (2015). Penalized likelihood methods include Yuan and Lin (2007), Banerjee
et al. (2008), Friedman et al. (2008), Johnson et al. (2011) and Ravikumar et al. (2011)
among others. Cai et al. (2011) propose an estimator called CLIME that estimates precision
matrices by solving the dual of an `1 penalized maximum likelihood problem. Limited order
partial correlation procedures use lower order partial correlations to test for conditional
independence relations. See Spirtes et al. (2000), Kalisch and Bühlmann (2007), Rütimann
et al. (2009), Liang et al. (2015) and Huang et al. (2016).
The rest of the article is organized as follows. Section 2 introduces the stepwise approach
along with some notation. Section 3 gives simulations results and a real data example.
Section 4 presents some concluding remarks. The Appendix shows a detailed description
of the crossvalidation procedure used to determine the required parameters in our stepwise
algorithm and gives some additional results from our simulation study.

2 Stepwise Approach to Covariance Selection

2.1 Definitions and Notation

In this section we review some definitions and technical concepts needed later on. Let
G = (V, E) be a graph where V 6= ∅ is the set of nodes or vertices and E ⊆ V × V = V 2 is
the set of edges. For simplicity we assume that V = {1, . . . , p}. We assume that the graph
G is undirected, that is, (i, j) ∈ E if and only if (j, i) ∈ E. Two nodes i and j are called
connected, adjacent or neighbors if (i, j) ∈ E.
A graphical model (GM) is a graph such that V indexes a set of variables {X1 , . . . , Xp }
and E is defined by:

(i, j) ∈
/ E if and only if Xi Xj | XV \{i,j}. (2.1)

3

Here denotes conditional independence.
Given a node i ∈ V , its neighborhood Ai is defined as

Ai = {l ∈ V \ {i} : (i, l) ∈ E}. (2.2)

Notice that Ai gives the nodes directly connected with i and therefore a GM can be
effectively described by giving the system of neighborhoods {Ai }pi=1 .
We further assume that (X1 , . . . , Xp )> ∼ N(0, Σ), where Σ = (σij )i,j=1...,p is a positive-
definite covariance matrix. In this case the graph is called a Gaussian graphical model
(GGM). The matrix Ω = (ωij )i,j=1...,p = Σ−1 is called precision matrix.
There exists an extensive literature on GM and GGM. For a detailed treatment of the
theory see for instance Lauritzen (1996), Edwards (2000), and Bühlmann and Van De Geer
(2011).

2.2 Conditional dependence in a GGM

In a GGM the set of edges E represents the conditional dependence structure of the vector
(X1 , . . . , Xp ). To represent this dependence structure as a statistical model it is convenient
to find a parametrization for E.
In this subsection we introduce a convenient parametrization of E using well known
results from classical multivariate analysis. For an exhaustive treatment of these results
see, for instance, Anderson (2003), Cramér (1999), Lauritzen (1996) and Eaton (2007).
Given a subset A of V , XA denotes the vector of variables with subscripts in A in
increasing order. For a given pair of nodes (i, l), set X> >
1 = (Xi , Xl ), X2 = XV \{i,l} and
> >
X = X>

1 , X2 . Note that X has multivariate normal distribution with mean 0 and

4
covariance matrix  
Σ11 Σ12
  (2.3)
Σ21 Σ22
such that Σ11 has dimension 2 × 2, Σ12 has dimension 2 × (p − 2) and so on. The matrix in
(2.3) is a partition of a permutation of the original covariance matrix Σ, and will be also
denoted by Σ, after a small abuse of notation.
Moreover, we set
 −1  
Σ11 Σ12 Ω11 Ω12
Ω=  = .
Σ21 Σ22 Ω21 Ω22

Then, by (B.2) of Lauritzen (1996), the blocks Ωi,j can be written explicitly in terms of
Σi,j and Σ−1
i,j . In particular
−1
Ω11 = Σ11 − Σ12 Σ−1
22 Σ21 where
 
ωii ωil
Ω11 =  
ωli ωll

is the submatrix of Ω (with rows i and l and columns i and l). Hence,

cov (X1 |X2 ) = Σ11 − Σ12 Σ−1

22 Σ21 (2.4)

= Ω−1
11
 
1 ω −ωil
=  ll 
ωii ωll − ωil ωli −ωli ωii

and, in consequence, the partial correlation between Xi and Xl can be expressed as

ωil
corr Xi , Xl |XV \{i,l} = − √ . (2.5)
ωii ωll

5
This gives the standard parametrization of E in terms of the support of the precision matrix

supp (Ω) = {(i, l) ∈ V 2 : i 6= l, ωi,l 6= 0}. (2.6)

We now introduce another parametrization of E, which we need to define and implement

our proposed method. We consider the regression error for the regression of X1 on X2 ,

b 1 = X1 − β > X2
ε = X1 − X

and let εi and εl denote the entries of ε (i.e. ε> = (εi , εl )). The regression error ε is
independent of X
b 1 and has normal distribution with mean 0 and covariance matrix Ψ11

with elements denoted by

 
ψii ψil
Ψ11 =  . (2.7)
ψli ψll

A straightforward calculation shows that

b 1 − 2cov X1 , X
Ψ11 = cov (X1 ) + cov X b1

= Σ11 + Σ12 Σ−1 −1 −1

22 Σ22 Σ22 Σ21 − 2Σ12 Σ22 Σ21

= Σ11 − Σ12 Σ−1 −1

22 Σ21 = Ω11 .

See Cramér (1999, Section 23.4).

Therefore, by this equality, (2.4) and (2.5), the partial correlation coefficient and the
conditional correlation are equal
ψil
ρil·V \{i,l} = corr Xi , Xl |XV \{i,l} = √ .
ψii ψll

6
Summarizing, the problem of determining the conditional dependence structure in a GGM
(represented by E) is equivalent to finding the pairs of nodes of V that belong to the set

{(i, l) ∈ V 2 : i 6= l, ψi,l 6= 0} (2.8)

which is equal to the support of the precision matrix, supp (Ω), defined by (2.6).

Remark 1. As noticed above, under normality, partial and conditional correlation are the
same. However, in general they are different concepts (Lawrance, 1976).

Remark 2. Let βi,l be the regression coefficient of Xl in the regression of Xi versus XV \{i}
and, similarly let βl,i be the regression coefficient of Xi in the regression of Xl versus
p
XV \{i} . Then it follows that ρil·V \{i,l} = sign (βl,i ) βl,i βi,l . This allows for another popular
parametrization for E. Moreover, let i be the error term in the regression of the ith variable
on the remaining ones. Then by Lemma 1 in Peng et al. (2009) we have that cov(i , l ) =
ωil /ωii ωll and var(i ) = 1/ωii .

2.3 The Stepwise Algorithm

Conditionally on its neighbors, Xi is independent of all the other variables. Formally, for
all i,

if l ∈
/ Ai and l 6= i then Xi Xl |XAi . (2.9)

Therefore, given a system of neighborhoods {Ai }pi=1 and l ∈

/ Ai (and so i ∈
/ Al ), the partial
correlation between Xi and Xl can be obtained by the following procedure: (i) regress
Xi on XAi and compute the regression residual εi ; regress Xl on XAl and compute the
regression residual εl ; (ii) calculate the Pearson correlation between εi and εl .
This reasoning motivates the graphical stepwise algorithm (GSA). It begins with the
(0)
family of empty neighborhoods, Âj = ∅ for each j ∈ V . There are two basic steps, the

7
forward and the backward steps. In the forward step, the algorithm adds a new edge (j0 , l0 )
if the largest absolute empirical partial correlation between the variables Xj0 , Xl0 is above
the given threshold αf . In the backward step the algorithm deletes an edge (j0 , l0 ) if the
smallestt absolute empirical partial correlation between the variables Xj0 , Xl0 is below the
given threshold αb . A step by step description of GSA is as follows:

Graphical Stepwise Algorithm

Input: the (centered) data {x1 , ..., xn } , and the forward and backward thresholds αf and
αb .

Initialization. k = 0: set Ab01 = Ab02 = · · · = Ab0p = φ.

Iteration Step. Given Abk1 , Abk2 , ..., Abkp we compute Abk+1 k+1 bk+1 as follows.
1 , A2 , ..., Ap

Forward. For each j = 1, ..., p do the following.

/ Abkj calculate the partial correlations fjlk as follows.

For each l ∈

(a) Regress the j th variable on the variables with subscript in the set Abkj and compute

the regression residuals ekj = ek1j , ek2j , ..., eknj .

(b) Regress the lth variables on the variables with subscript in the set Abkl and compute
the regression residuals ekl = ek1l , ek2l , ..., eknl .

If

max fjlk = fjk0 l0 ≥ αf

l∈
/Abk ,j∈V
j

set Abk+1 bk bk+1 = Abk ∪ {j0 } , Abk+1 = Abk for l 6= j0 , l0

j0 = Aj0 ∪ {l0 } , Al0 l0 l l

If

max fjlk = fjk0 l0 < αf , stop.

8
Backward. For each j = 1, ..., p do the following.

For each l ∈ Abk+1

j calculate the partial correlation bkjl as follows.

(a) Regress the j th variables on the variables with subscript in the set Abk+1
j \ {l} and

compute the regression residuals rkj = r1j k , r k , ..., r k
2j nj .

(b) Regress the lth variable on the variables with subscript in the set Abk+1
l \ {j} and
compute the regression residuals rkl = r1lk , r k , ..., r k .

2l nl

If

k k
min bjl = bj0 l0 ≤ αb
bk ,j∈V
l∈A j

set Abk+1 bk+1 bk+1 → Abk+1 \ {j0 }.

j0 → Aj0 \ {l0 } , Al0 l0

Output

1. A collection of estimated neighborhoods Abj , j = 1, . . . , p.

n o
2. The set of estimated edges Eb = (i, l) ∈ V 2 : i ∈ Abl .

3. An estimate of Ω, Ω ωil )pi,l=1 with ω

b = (b bil defined as follow: in the case i = l, ω
bii =
n/(eTi ei ) for i = 1, ..., p, where ei is the vector of the prediction errors in the regression
of the ith variable on X Abi . In the case i 6= l we must distinguish two cases, if l ∈
/ Abi
bil = n eTi el / eTi ei eTl el (see Remark 2).

then ωbil = 0, otherwise ω

2.4 Thresholds selection by cross-validation

Let X be the n × p matrix with rows xi = (xi1 , . . . , xip ), i = 1, . . . , n, corresponding to
n observations. We randomly partition the dataset {xi }1≤i≤n into K disjoint subsets of

9
K
X
th
approximately equal sizes, the t subset being of size nt ≥ 2 and nt = n. For every t, let
t=1
(t) (t)
{xi }1≤i≤nt be the tth validation subset, and its complement {e the tth training
xi }1≤i≤n−nt ,
(t) (t)
subset. For every t and for every pair (αf , αb ) of threshold parameters let Ab1 , . . . , Abp
be the estimated neighborhoods given by GSA using the tth training subset. For every
j = 1, . . . , p let βb b(t) be the estimated coefficient of the regression of the variable Xj on the
Aj
(t)
neighborhood Abj .
(t)
Consider now the tth validation subset. So, for every j, using βb (t) , we obtain the vector
Aj
b (t) (αf , αb ). If A(t) = ∅ we predict each observation of Xj by the
of predicted values X j j

sample mean of the observations in the tth dataset of this variable.

Then, we define the K–fold cross–validation function as
K pj
1 XX (t) b (t)
2
CV (αf , αb ) = Xj − Xj (αf , αb )

n t=1 j=1

where k·k the L2-norm or euclidean distance in Rp . Hence the K–fold cross–validation
forward–backward thresholds α
bf , α
bb is

(b
αf , α
bb ) =: argmin CV (αf , αb )
(αf ,αb )∈H

where H is a grid of ordered pairs (αf , αb ) in [0, 1] × [0, 1] over which we perform the search.
For a detail description see the Appendix.

2.5 Example
To illustrate the algorithm we consider the GGM with 16 edges given in the first panel
of Figure 1. We draw n = 1000 independent observations from this model (see the next
section for details). The values for the threshold parameters αf = 0.17 and αb = 0.09 are

10
determined by 5-fold cross-validation. The figure also displays the selected pairs of edges
at each step in a sequence of successive updates of Abkj , for k = 1, 4, 9, 12 and the final step
k = 16, showing that the estimated graph is identical to the true graph.

●
7 ●
6
●
5 ●
7 ●
6
●
5 ●
7 ●
6
●
5
●
8 ●
4 ●
8 ●
4 ●
8 ●
4

●9 ●3 ●9 ●3 ●9 ●3

●10 ●2 ●10 ●2 ●10 ●

●
11 ● 1 ●
11 ● 1 ●
11 ● 1

●12 ●
20 ●12 ●
20 ●12 ●
20

●
13 ●
19 ●
13 ●
19 ●
13 ●
19

●
14 ●
18 ●
14 ●
18 ●
14 ●
18
●
15
●
16 ●
17 ●
15
●
16 ●
17 ●
15
●
16 ●
17

True graph k=1 k=4

●
7 ●
6
●
5 ●
7 ●
6
●
5 ●
7 ●
6
●
5
●
8 ●
4 ●
8 ●
4 ●
8 ●
4

●9 ●3 ●9 ●3 ●9 ●3

●10 ●2 ●10 ●2 ●10 ●

●
11 ● 1 ●
11 ● 1 ●
11 ● 1

●12 ●
20 ●12 ●
20 ●12 ●
20

●
13 ●
19 ●
13 ●
19 ●
13 ●
19

●
14 ●
18 ●
14 ●
18 ●
14 ●
18
●
15
●
16 ●
17 ●
15
●
16 ●
17 ●
15
●
16 ●
17

k=9 k = 12 k = 16

bk , for k = 1, 4, 9, 12, 16 of the GSA.

Figure 1: True graph and sequence of successive updates of Aj

3 Numerical results and real data example

We conducted extensive Monte Carlo simulations to investigate the performance of GS. In
this section we report some results from this study and a numerical experiment using real
data.

11
3.1 Monte Carlo simulation study
Simulated Models
We consider three dimension values p = 50, 100, 150 and three different models for Ω:

Model 1. Autoregressive model of orden 1, denoted AR(1). In this case Σij = 0.4|i−j|
for i, j = 1, . . . p.

Model 2. Nearest neighbors model of order 2, denoted NN(2). For each node we
randomly select two neighbors and choose a pair of symmetric entries of Ω using the
NeighborOmega function of the R package Tlasso.

Model 3. Block diagonal matrix model with q blocks of size p/q, denoted BG. For
p = 50, 100 and 150, we use q = 10, 20 and 30 blocks, respectively. Each block, of
size p/q = 5, has diagonal elements equal to 1 and off-diagonal elements equal to 0.5.

For each p and each model we generate R = 50 random samples of size n = 100. These
graph models are widely used in the genetic literature to model gene expression data. See
for example Lee and Liu (2015) and Lee and Ghi (2006). Figure 2 displays graphs from
Models 1-3 with p = 100 nodes.

● ● ● ● ●
● ●
● ● ●
● ● ● ●
●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ●
●
● ● ●
●
●
● ● ●
● ● ● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ● ●
●
● ●
● ● ● ●
● ● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●
● ● ●
● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ● ● ●
● ●
● ● ●
●
● ● ●
● ● ● ● ●
● ●
● ●
● ● ●
● ● ●
●
● ● ●
●
●
● ●
● ● ●
●
● ● ●
● ●
●
● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ●
● ● ●
● ●
● ● ● ● ●
●
● ●
●
● ● ● ● ● ●
● ●
● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●

AR(1) NN(2) BG

Figure 2: Graphs of AR(1), NN(2) and BG graphical models for p = 100 nodes.

12
Methods
We compare the performance of GS with Graphical lasso (Glasso) and Constrained l1 -
minimization for inverse matrix estimation (CLIME) proposed by Friedman et al. (2008)
and Cai et al. (2011) respectively. Therefore, the methods compared in our simulation
study are:

1. The proposed method GS with the forward and backward thresholds, (αf , αb ), esti-
mated by 5-fold crossvalidation on a grid of 20 values in [0, 1] × [0, 1], as described in
Subsection 2.4. The computing algorithm is available by request.

2. The Glasso estimate obtained by solving the `1 penalized-likelihood problem:

min −log{det[Ω]} + tr{ΩX> X} + λ k Ω k1 .

(3.1)
Ω0

In our simulations and examples we use the R-package CVglasso with the tuning
parameter λ selected by 5−fold crossvalidation (the package default).

3. The CLIME estimate obtained by symmetrization of the solution of

min{k Ω k1 subject to |SΩ − I|∞ ≤ λ}, (3.2)

where S is the sample covariance, I is the identity matrix, |·|∞ is the elementwise l∞
norm, and λ is a tuning parameter. For computations, we use the R-package clime
with the tuning parameter λ selected by 5−fold crossvalidation (the package default).

To evaluate the ability of the methods for finding the pairs of edges, for each replicate, we
compute the Matthews correlation coefficient (Matthews, 1975)

TP × TN − FP × FN
MCC = p , (3.3)
(TP + FP)(TP + FN)(TN + FP)(TN + FN)

13
the Specificity = TN/(TN + FP) and the Sensitivity = TP/(TP + FN), where TP, TN,
FP and FN are, in this order, the number of true positives, true negatives, false positives
and false negatives, regarding the identification of the nonzero off-diagonal elements of Ω.
Larger values of MCC, Sensitivity and Specificity indicate a better performance (Fan et al.,
2009; Baldi et al., 2000).
For every replicate, the performance of Ω
b as an estimate for Ω is measured by mF =

||Ω
b − Ω||F (where || · ||F denotes the Frobenius norm) and by the normalized Kullback-

Leibler divergence defined by mN KL = DKL /(1 + DKL ) where

1 n b −1 o n h io
b −1 − p

DKL = tr ΩΩ − log det ΩΩ
2

is the the Kullback-Leibler divergence between Ω

b and Ω.

Results
Table 1 shows the MCC performance for the three methods under Models 1-3. GS
clearly outperforms the other two methods while CLIME just slightly outperforms Glasso.
Cai et al. (2011) pointed out that a procedure yielding a more sparse Ω b is preferable

because this facilitates interpretation of the data. The sensitivity and specificity results,
reported in Table 4 in Appendix, show that in general GS is more sparse than the CLIME
and Glasso, yielding fewer false positives (more specificity) but a few more false negatives
(less sensitivity). Table 2 shows that under models AR(1) and NN(2) the three methods
achieve fairly similar performances for estimating Ω. However, under model BG, GS clearly
outperforms the other two.
Figure 3 display the heat-maps of the number of non-zero links identified in the 50
replications under model AR(1). Notice that among the three compared methods, the GS
sparsity patterns best match those of the true model. Figures 4 and 5 in the Appendix
lead to similar conclusions for models NN(2) and BG.

14
Table 1: Comparison of means and standard deviations (in brackets) of MCC over R = 50 replicates.

Model p GS Glasso CLIME

50 0.741 (0.009) 0.419 (0.016) 0.492 (0.006)
AR(1) 100 0.751 (0.004) 0.433 (0.020) 0.464 (0.004)
150 0.730 (0.004) 0.474 (0.017) 0.499 (0.003)
50 0.751 (0.004) 0.404 (0.014) 0.401 (0.007)
NN(2) 100 0.802 (0.005) 0.382 (0.006) 0.407 (0.005)
150 0.695 (0.007) 0.337 (0.008) 0.425 (0.003)
50 0.898 (0.005) 0.356 (0.009) 0.482 (0.005)
BG 100 0.857 (0.005) 0.348 (0.004) 0.461 (0.002)
150 0.780 (0.008) 0.314 (0.003) 0.408 (0.003)

(a) p = 50

(b) p = 100

Figure 3: Model AR(1). Heatmaps of the frequency of the zeros identified for each entry of the precision matrix out of
R = 50 replicates. White color is 50 zeros identified out of 50 runs, and black is 0/50.

15
Table 2: Comparison of means and standard deviations (in brackets) of mF and mN KL over R = 50 replicates.

GS Glasso CLIME
Model p mN KL mF mN KL mF mN KL mF
50 0.70 3.82 0.64 3.90 0.63 3.91
(0.00) (0.00) (0.00) ( 0.02) (0.00) (0.01)
AR(1) 100 0.83 5.73 0.80 5.72 0.79 5.75
(0.00) (0.00) (0.00) (0.02) (0.00) (0.01)
150 1.25 7.16 1.17 7.21 1.17 7.25
(0.00) (0.00) (0.00) (0.02) (0.00) (0.01)
50 0.99 6.98 0.99 6.65 0.99 6.64
(0.00) (0.00) (0.00) (0.01) (0.00) (0.00)
NN(2) 100 0.10 10.11 1.00 9.64 1.00 9.601
(0.00) (0.00) (0.00) (0.009) (0.000) (0.005)
150 1.00 12.37 1.00 11.90 1.00 11.79
(0.00) (0.00) (0.00) (0.01) (0.00) (0.00)
BG 50 0.46 1.44 0.85 5.45 0.82 5.03
(0.00) (0.00) (0.00) (0.10) (0.00) (0.05)
100 0.71 2.94 0.93 9.16 0.92 8.71
(0.00) (0.00) (0.00) (0.07) (0.00) (0.02)
150 0.88 6.10 0.96 11.59 0.96 11.42
(0.00) (0.00) (0.00) (0.06) (0.00) (0.02)

3.2 Analysis of Breast Cancer Data

In preoperative chemoterapy, the complete eradication of all invasive cancer cells is referred
to as pathological complete response, abbreviated as pCR. It is known in medicine that pCR
is associated with the long-term cancer-free survival of a patient. Gene expression profiling
(GEP) – the measurement of the activity (expression level) of genes in a patient – could in
principle be a useful predictor for the patient’s pCR.
Using normalized gene expression data of patients in stages I-III of breast cancer, Hess
et al. (2006) aim to identify patients that may achieve pCR under sequential anthracycline
paclitaxel preoperative chemotherapy. When a patient does not achieve pCR state, he is

16
classified in the group of residual disease (RD), indicating that cancer still remains. Their
data consist of 22283 gene expression levels for 133 patients, with 34 pCR and 99 RD.
Following Fan et al. (2009) and Cai et al. (2011) we randomly split the data into a training
set and a testing set. The testing set is formed by randomly selecting 5 pCR patients
and 16 RD patients (roughly 1/6 of the subjects) and the remaining patients form the
training set. From the training set, a two sample t-test is performed to select the 50 most
significant genes. The data is then standardized using the standard deviation estimated
from the training set.
We apply a linear discriminant analysis (LDA) to predict whether a patient may achieve
pathological complete response (pCR), based on the estimated inverse covariance matrix
of the gene expression levels. We label with r = 1 the pCR group and r = 2 the RD group
and assume that data are normally distributed, with common covariance matrix Σ and
different means µr . From the training set, we obtain µ
br, Ω
b and for the test data compute

the linear discriminant score as follows

b µr − 1 µ> Ωµ
δr (x) = x> Ωb b r + logb
πr for i = 1, . . . , n, (3.4)
2 r
where π
br is the proportion of group r subjects in the training set. The classification rule is

rb(x) = argmax δr (x) for r = 1, 2. (3.5)

For every method we use 5-fold cross validation on the training data to select the tuning
constants. We repeat this scheme 100 times.
Table 3 displays the means and standard errors (in brackets) of Sensitivity, Specificity,
MCC and Number of selected Edges using Ω b over the 100 replications. Considering the

MCC, GS is slightly better than CLIME and CLIME than Glasso. While the three methods
give similar performance considering the Specificity, GS and CLIME improve over Glasso
in terms of Sensitivity.

17
Table 3: Comparison of means and standard deviations (in brackets) of Sensitivity, Specificity, MCC and Number of selected
edges over 100 replications.

GS CLIME Glasso
Sensitivity 0.798 (0.02) 0.786 (0.02) 0.602 (0.02)
Specificity 0.784 (0.01) 0.788 (0.01) 0.767 (0.01)
MCC 0.520 (0.02) 0.516 (0.02) 0.334 (0.02)
Number of Edges 54 (2) 4823 (8) 2103 (76)

4 Concluding remarks
This paper introduces a stepwise procedure, called GS, to perform covariance selection in
high dimensional Gaussian graphical models. Our method uses a different parametrization
of the Gaussian graphical model based on Pearson correlations between the best-linear-
predictors prediction errors. The GS algorithm begins with a family of empty neighbor-
hoods and using basic steps, forward and backward, adds or delete edges until appropriate
thresholds for each step are reached. These thresholds are automatically determined by
cross–validation.
GS is compared with Glasso and CLIME under different Gaussian graphical models
(AR(1), NN(2) and BG) and using different performance measures regarding network re-
covery and sparse estimation of the precision matrix Ω. GS is shown to have good support
recovery performance and to produce simpler models than the other two methods (i.e. GS
is a parsimonious estimation procedure).
We use GS for the analysis of breast cancer data and show that this method may be a
useful tool for applications in medicine and other fields.

18
Acknowledgements
The authors thanks the generous support of NSERC, Canada, the Institute of Financial
Big Data, University Carlos III of Madrid and the CSIC, Spain.

A Appendix

A.1 Selection of the thresholds parameters by cross-validation

Let X be the n × p matrix with rows xi = (xi1 , . . . , xip ), i = 1, . . . , n, corresponding to n
observations. For each j = 1, . . . , p, let Xj = (x1j , . . . , xnj )> denote the jth–column of the
matrix X.
We randomly partition the dataset {xi }1≤i≤n into K disjoint subsets of approximately
K
X
th (t)
equal size, the t subset being of size nt ≥ 2 and nt = n. For every t, let {xi }1≤i≤nt
t=1
(t)
be the tth validation subset, and its complement {e xi }1≤i≤n−nt , the tth training subset.
(t) (t)
For every t = 1, . . . , K and threshold parameters (αf , αb ) ∈ [0, 1]×[0, 1] let Ab1 , . . . , Abp
(t)
be the estimated neighborhoods given by GSA using the tth training subset {e
xi }1≤i≤n−nt
with xe(t)
i = (e
(t)
xi1 , . . . , x
(t)
eip ), 1 ≤ i ≤ n − nt . Consider for every node j the estimated
(t)
neighborhood Abj = {l1 , . . . , lq } and let βbAb(t) be the estimated coefficient of the regression
j
(t) (t)
of X
e j = (ex ,...,x
1j en−nt j )> on Xl1 , . . . , Xlq , represented in (A.2) (red colour).
(t) (t) (t) (t)
Consider the tth validation subset {xi }1≤i≤nt with xi = (xi1 , . . . , xip ), 1 ≤ i ≤ nt
>
(t) (t) (t)
and for every j let Xj = x1j , . . . , xnt j and define the vector of predicted values

b (t) (αf , αb ) = X b(t) βb(t)(t) ,

X j A Aj
j

(t) (t)
where XAb(t) is the matrix with rows (xil1 , . . . , xilq ), 1 ≤ i ≤ nt represented in (A.2) (in blue
j

19
(t)
colour). If the neighborhood Aj = ∅ we define

b (t) (αf , αb ) = (x̄(t) , . . . , x̄(t) )>

X j j j

(t) (t) (t)

where x̄j is the mean of the sample of observations x1j , . . . , xnt j .
We define the K–fold cross–validation function as
K p
1 XX (t) b (t)
2
CV (αf , αb ) = Xj − Xj (αf , αb )

n t=1 j=1

where k·k the L2-norm or euclidean distance in Rp . Hence the K–fold cross–validation
forward–backward thresholds α
bf , α
bb is

(b
αf , α
bb ) =: argmin CV (αf , αb ) (A.1)
(αf ,αb )∈H

where H is a grid of ordered pairs (αf , αb ) in [0, 1] × [0, 1] over which we perform the search.

 
tth training subset
 
(t) (t) (t)

 ··· x
e1j ··· x
e1l1 ··· x
e1lq ··· 

 .. .. .. .. .. .. .. 
. . . . . . .
 
 
 
 
 
(t) (t) (t)
··· ··· ··· ···
 
 x
en−nt j x
en−nt l1 x
en−nt lq 
(A.2)
 
 
 
 
tth
 

 validation subset 

 (t) (t) (t) 

 ··· x1j ··· x1l1 ··· x1lq ··· 

 .. .. .. .. .. .. .. 

 . . . . . . . 

(t) (t) (t)
··· xnt j ··· xnt l1 ··· xnt lq ···

Remark 3. Matrix (A.2) represents, for every node j the comparison between estimated
and predicted values for cross-validation. βb b(t) is computed using the observations X
Aj
ej =

20
(t) (t) (t) (t)
(e en−nt j )> and the matrix X
x1j , . . . , x e b(t) with rows (e
Aj
eilq ), i = 1, . . . , n − nt in the tth
xil1 , . . . , x
training subset (red colour). Based on the tth validation set X b (t) is computed using X b(t)
j Aj

and compared with Xj (in blue color).

21
Table 4: Comparison of means and standard deviations (in brackets) of Specificity, Sensitivity and MCC over R = 50
replicates.

GS Glasso CLIME
Model p Sensitivity Specificity MCC Sensitivity Specificity MCC Sensitivity Specificity MCC
50 0.756 0.988 0.741 0.994 0.823 0.419 0.988 0.891 0.492
(0.015) (0.002) (0.009) (0.002) (0.012) (0.016) (0.002) (0.003) (0.006)
AR(1) 100 0.632 0.999 0.751 0.989 0.897 0.433 0.983 0.934 0.464
(0.007) (0.000) (0.004) (0.002) (0.009) (0.020) (0.002) (0.001) (0.004)
150 0.607 0.999 0.730 0.981 0.943 0.474 0.972 0.964 0.499
(0.006) (0.000) (0.004) (0.002) (0.007) (0.017) (0.002) (0.001) (0.003)
50 0.632 0.999 0.751 0.971 0.864 0.404 0.984 0.875 0.401
(0.007) (0.000) (0.004 ) (0.004) (0.010) (0.014) (0.003) (0.004) (0.007)
NN(2) 100 0.730 0.999 0.802 0.987 0.924 0.382 0.985 0.937 0.407
(0.008) (0.000) (0.005) (0.002) (0.004) (0.006) (0.002) (0.001) (0.005)
150 0.555 0.999 0.695 0.952 0.936 0.337 0.934 0.965 0.425
(0.017) (0.000) (0.007) (0.004) (0.002) (0.008) ( 0.003) (0.001) (0.003)
50 0.994 0.981 0.898 0.867 0.697 0.356 0.962 0.807 0.482
(0.002) (0.001) (0.005) (0.032) (0.021) (0.009) (0.004) (0.005) (0.005)
BG 100 0.949 0.989 0.857 0.569 0.908 0.348 0.818 0.920 0.4615
(0.007) (0.000) (0.005) (0.039) (0.011) ( 0.004) (0.005) (0.005) (0.002)
150 0.782 0.994 0.780 0.426 0.952 0.314 0.626 0.959 0.408
(0.021) (0.000) (0.008) (0.035) (0.006) (0.003) (0.006) (0.001) (0.003)

22
A.2 Complementary simulation results

(a) p = 50

(b) p = 100

Figure 4: Model NN(2). Heatmaps of the frequency of the zeros identified for each entry of the precision matrix out of
R = 50 replications. White color is 50 zeros identified out of 50 runs, and black is 0/50.

23
(a) p = 50

(b) p = 100

Figure 5: Model BG. Heatmaps of the frequency of the zeros identified for each entry of the precision matrix out of R = 50
replications. White color is 50 zeros videntified out of 50 runs, and black is 0/50.

References
Anderson, T. (2003). An Introduction to Multivariate Statistical Analysis. John Wiley.

Baldi, P., S. Brunak, Y. Chauvin, C. Andersen, and H. Nielsen (2000). Assessing the
accuracy of prediction algorithms for classification: An overview. Bioinformatics 16 (5),
412–424.

Banerjee, O., L. El Ghaoui, and A. d’Aspremont (2008). Model selection through sparse
maximum likelihood estimation for multivariate gaussian or binary data. The Journal
of Machine Learning Research 9, 485–516.

Bühlmann, P. and S. Van De Geer (2011). Statistics for high-dimensional data: methods,
theory and applications. Springer Science & Business Media.

24
Cai, T., W. Liu, and X. Luo (2011). A constrained `1 minimization approach to sparse
precision matrix estimation. Journal of the American Statistical Association 106 (494),
594–607.

Cramér, H. (1999). Mathematical Methods of Statistics. Princeton University Press.

Dempster, A. P. (1972). Covariance selection. Biometrics, 157–175.

Eaton, M. L. (2007). Multivariate Statistics : A Vector Space Approach. Institute of

Mathematical Statistics.

Edwards, D. (2000). Introduction to Graphical Modelling. Springer Science & Business

Media.

Fan, J., Y. Feng, and Y. Wu (2009). Network exploration via the adaptive lasso and scad
penalties. The Annals of Applied Statistics 3 (2), 521–541.

Friedman, J., T. Hastie, and R. Tibshirani (2008). Sparse inverse covariance estimation
with the graphical lasso. Biostatistics 9 (3), 432–441.

Hess, K. R., K. Anderson, W. F. Symmans, V. Valero, N. Ibrahim, J. A. Mejia, D. Booser,

R. L. Theriault, A. U. Buzdar, P. J. Dempsey, et al. (2006). Pharmacogenomic predictor
of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin,
and cyclophosphamide in breast cancer. Journal of Clinical Oncology 24 (26), 4236–4244.

Huang, S., J. Jin, and Z. Yao (2016). Partial correlation screening for estimating large
precision matrices, with applications to classification. The Annals of Statistics 44 (5),
2018–2057.

25
Johnson, C. C., A. Jalali, and P. Ravikumar (2011). High-dimensional sparse inverse
covariance estimation using greedy methods. arXiv preprint arXiv:1112.6411 .

Kalisch, M. and P. Bühlmann (2007). Estimating high-dimensional directed acyclic graphs

with the pc-algorithm. The Journal of Machine Learning Research 8, 613–636.

Lauritzen, S. L. (1996). Graphical Models. Oxford University Press.

Lawrance, A. J. (1976). On conditional and partial correlation. The American Statisti-

cian 30 (3), 146–149.

Ledoit, O. and M. Wolf (2004). A well-conditioned estimator for large-dimensional covari-

ance matrices. Journal of Multivariate Analysis 88 (2), 365–411.

Lee, H. and J. Ghi (2006). Gradient directed regularization for sparse gaussian concen-
tration graphs, with applications to inference of genetic networks. Biostatistics 7 (2),
302317.

Lee, W. and Y. Liu (2015). Joint estimation of multiple precision matrices with common
structures. Journal of Machine Learning Research 16 (1), 10351062.

Liang, F., Q. Song, and P. Qiu (2015). An equivalent measure of partial correlation coeffi-
cients for high-dimensional gaussian graphical models. Journal of the American Statis-
tical Association 110 (511), 1248–1265.

Liu, H. and L. Wang (2012). Tiger: A tuning-insensitive approach for optimally estimating
gaussian graphical models. arXiv preprint arXiv:1209.2437 .

Matthews, B. (1975). Comparison of the predicted and observed secondary structure of t4

phage lysozyme. Biochimica et Biophysica Acta 405 (2), 442451.

26
Meinshausen, N. and P. Bühlmann (2006). High-dimensional graphs and variable selection
with the lasso. The Annals of Statistics 34 (3), 1436–1462.

Peng, J., P. Wang, N. Zhou, and J. Zhu (2009). Partial correlation estimation by joint
sparse regression models. Journal of the American Statistical Association 104 (486),
735–746.

Ravikumar, P., M. J. Wainwright, G. Raskutti, B. Yu, et al. (2011). High-dimensional

covariance estimation by minimizing `1 -penalized log-determinant divergence. Electronic
Journal of Statistics 5, 935–980.

Ren, Z., T. Sun, C.-H. Zhang, H. H. Zhou, et al. (2015). Asymptotic normality and opti-
malities in estimation of large gaussian graphical models. The Annals of Statistics 43 (3),
991–1026.

Rütimann, P., P. Bühlmann, et al. (2009). High dimensional sparse covariance estimation
via directed acyclic graphs. Electronic Journal of Statistics 3, 1133–1160.

Spirtes, P., C. N. Glymour, and R. Scheines (2000). Causation, Prediction, and Search.
MIT press.

Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear pro-
gramming. The Journal of Machine Learning Research 11, 2261–2286.

Yuan, M. and Y. Lin (2007). Model selection and estimation in the gaussian graphical
model. Biometrika 94 (1), 19–35.

Zhou, S., P. Rütimann, M. Xu, and P. Bühlmann (2011). High-dimensional covariance

estimation based on gaussian graphical models. The Journal of Machine Learning Re-
search 12, 2975–3026.

A Stepwise Approach For High-Dimensional Gaussian Graphical Models
No ratings yet
A Stepwise Approach For High-Dimensional Gaussian Graphical Models
31 pages
1 s2.0 S0047259X06000339 Main
No ratings yet
1 s2.0 S0047259X06000339 Main
26 pages
Tutorial: Gaussian Process Models For Machine Learning
No ratings yet
Tutorial: Gaussian Process Models For Machine Learning
35 pages
Chen TvOnSigma2013
No ratings yet
Chen TvOnSigma2013
29 pages
Durrande 2020
No ratings yet
Durrande 2020
90 pages
Ghahramani Lecture2
No ratings yet
Ghahramani Lecture2
30 pages
Manual GPML
No ratings yet
Manual GPML
51 pages
A Tutorial On Gaussian Processes (Or Why I Don'T Use SVMS) : Zoubin Ghahramani
No ratings yet
A Tutorial On Gaussian Processes (Or Why I Don'T Use SVMS) : Zoubin Ghahramani
31 pages
Meinshausen & Bühlmann, High-Dimensional Graphs and Variable Selection With The Lasso 009053606000000281
No ratings yet
Meinshausen & Bühlmann, High-Dimensional Graphs and Variable Selection With The Lasso 009053606000000281
27 pages
2012 30 08 MichaelOsborne
No ratings yet
2012 30 08 MichaelOsborne
112 pages
High-dimensional Correlation Matrix Estimation For
No ratings yet
High-dimensional Correlation Matrix Estimation For
24 pages
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
No ratings yet
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
26 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
Joining Instructions Lisboa
No ratings yet
Joining Instructions Lisboa
8 pages
Gaussian Processes For Regression: A Tutorial
No ratings yet
Gaussian Processes For Regression: A Tutorial
7 pages
LINFO2275 Questions d Examen-4
No ratings yet
LINFO2275 Questions d Examen-4
34 pages
The Use of Gaussian Processes in System Identification
No ratings yet
The Use of Gaussian Processes in System Identification
13 pages
2310.19244v1
No ratings yet
2310.19244v1
168 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
Checking, Selecting & Predicting With Gams: Mathematical Sciences, University of Bath, U.K
No ratings yet
Checking, Selecting & Predicting With Gams: Mathematical Sciences, University of Bath, U.K
21 pages
Tajmouati Samya Publications 09 08 2022 10 08 16 55
No ratings yet
Tajmouati Samya Publications 09 08 2022 10 08 16 55
6 pages
14-AOS1221
No ratings yet
14-AOS1221
37 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
Lecture6 2015
No ratings yet
Lecture6 2015
36 pages
Regularized Estimation of Kronecker Structured Covariance Matrix Using Modified Cholesky Decomposition
No ratings yet
Regularized Estimation of Kronecker Structured Covariance Matrix Using Modified Cholesky Decomposition
27 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
L08_GMM
No ratings yet
L08_GMM
11 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
23-0893
No ratings yet
23-0893
64 pages
Machine Learning and Pattern Recognition Gaussian Processes
No ratings yet
Machine Learning and Pattern Recognition Gaussian Processes
6 pages
HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY
No ratings yet
HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY
28 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
Chapter 1 Introduction To Data Mining
No ratings yet
Chapter 1 Introduction To Data Mining
10 pages
Rig Notes 17
No ratings yet
Rig Notes 17
168 pages
Lesson 5 Model Selection
No ratings yet
Lesson 5 Model Selection
41 pages
lOFTUS_ET_AL
No ratings yet
lOFTUS_ET_AL
17 pages
A Well-Conditioned Estimator For Large-Dimensional Covariance Matrices
No ratings yet
A Well-Conditioned Estimator For Large-Dimensional Covariance Matrices
47 pages
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
No ratings yet
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
22 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Bühlmann, Meinshausen - Graphical Lasso
No ratings yet
Bühlmann, Meinshausen - Graphical Lasso
28 pages
Stochastic Differential Equations in Machine Learning
No ratings yet
Stochastic Differential Equations in Machine Learning
26 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
s-m-s-t-c--lecture-2425-3
No ratings yet
s-m-s-t-c--lecture-2425-3
61 pages
STAT501 Multivariate Analysis
No ratings yet
STAT501 Multivariate Analysis
196 pages
OSU Adjustment Notes Part 1
No ratings yet
OSU Adjustment Notes Part 1
230 pages
Huge Stars
No ratings yet
Huge Stars
14 pages
Notes For Multivariate Statistics With R
No ratings yet
Notes For Multivariate Statistics With R
189 pages
Cholesky-based multivariate Gaussian regression
No ratings yet
Cholesky-based multivariate Gaussian regression
21 pages
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
No ratings yet
Soederlind P. Lecture Notes For Econometrics (LN, Stockholm, 2002) (L) (86s) - GL - PDF
86 pages
Gaussian Processes in Machine Learning
No ratings yet
Gaussian Processes in Machine Learning
9 pages
MA451 S23 Assignment 2 Solutions Marking PDF
No ratings yet
MA451 S23 Assignment 2 Solutions Marking PDF
21 pages
Notes MSM
No ratings yet
Notes MSM
66 pages
Gaussian Process Regression With Heteroscedastic Residuals
No ratings yet
Gaussian Process Regression With Heteroscedastic Residuals
15 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
EcmAll PDF
No ratings yet
EcmAll PDF
266 pages
Week03 Lecture BB
No ratings yet
Week03 Lecture BB
112 pages
Outlier Detection Using Robust Mahalanobis Distance: Aparna Bhide, M. PALB 7187
No ratings yet
Outlier Detection Using Robust Mahalanobis Distance: Aparna Bhide, M. PALB 7187
46 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Level Set Method: Advancing Computer Vision, Exploring the Level Set Method
From Everand
Level Set Method: Advancing Computer Vision, Exploring the Level Set Method
Fouad Sabry
No ratings yet
7th Fall 2023 - Last Version - XLSX - Google Sheets
No ratings yet
7th Fall 2023 - Last Version - XLSX - Google Sheets
1 page
Peoplesoft Tables
No ratings yet
Peoplesoft Tables
4 pages
220644-665-666, DLD Lab#10
No ratings yet
220644-665-666, DLD Lab#10
11 pages
Experiment No. 8: 1. Aim: 2. Objectives
No ratings yet
Experiment No. 8: 1. Aim: 2. Objectives
3 pages
Fatek Fb/Fbs/B1/B1Z Series: Hmi Setting
No ratings yet
Fatek Fb/Fbs/B1/B1Z Series: Hmi Setting
12 pages
List of Enrollees GRADE 8
No ratings yet
List of Enrollees GRADE 8
5 pages
Data Envelopment Analysis Sample Computation
No ratings yet
Data Envelopment Analysis Sample Computation
10 pages
Cps Presentation
No ratings yet
Cps Presentation
13 pages
Let's Innovate - Teacher's Version
No ratings yet
Let's Innovate - Teacher's Version
6 pages
What Are Hard Skills
No ratings yet
What Are Hard Skills
5 pages
Selva
No ratings yet
Selva
76 pages
ISD Summaryv2
No ratings yet
ISD Summaryv2
86 pages
8051 Assembly Language Programming
No ratings yet
8051 Assembly Language Programming
35 pages
Datasheet LRMate-200iD-4S - KUKA
No ratings yet
Datasheet LRMate-200iD-4S - KUKA
1 page
Main
No ratings yet
Main
3 pages
Website Development of Crime Management System
No ratings yet
Website Development of Crime Management System
35 pages
Yusrolana Yudistira - 081811133002 - Tugas Big
No ratings yet
Yusrolana Yudistira - 081811133002 - Tugas Big
3 pages
RE 29902_2017-04
No ratings yet
RE 29902_2017-04
8 pages
The Sacrifice and Betray
No ratings yet
The Sacrifice and Betray
2 pages
PLC Programming & Application 24-05-18
No ratings yet
PLC Programming & Application 24-05-18
81 pages
Oled (Organic Light Emitting Diode) : E.Bhanu Prakash
No ratings yet
Oled (Organic Light Emitting Diode) : E.Bhanu Prakash
17 pages
03 Analyzing Windows Programs
No ratings yet
03 Analyzing Windows Programs
17 pages
Schem SPI External Editor Users Guide
No ratings yet
Schem SPI External Editor Users Guide
20 pages
Timpe-Laughlin & Dombi (2020)
No ratings yet
Timpe-Laughlin & Dombi (2020)
40 pages
SLA - Online Portal Manual v2 - 10 10 2023
No ratings yet
SLA - Online Portal Manual v2 - 10 10 2023
104 pages
Y11 Mock Mark Scheme 2023
No ratings yet
Y11 Mock Mark Scheme 2023
8 pages
Dyadic Systems Catalog
No ratings yet
Dyadic Systems Catalog
14 pages
CH 17
No ratings yet
CH 17
3 pages
1btc Freebitcoin Scripttxt
50% (2)
1btc Freebitcoin Scripttxt
2 pages
It i Data Centre Brochure
No ratings yet
It i Data Centre Brochure
4 pages

A Stepwise Approach For High-Dimensional Gaussian Graphical Models

Uploaded by

A Stepwise Approach For High-Dimensional Gaussian Graphical Models

Uploaded by

A Stepwise Approach for High-Dimensional

Gaussian Graphical Models

2 Stepwise Approach to Covariance Selection

2.1 Definitions and Notation

Ai = {l ∈ V \ {i} : (i, l) ∈ E}. (2.2)

2.2 Conditional dependence in a GGM

cov (X1 |X2 ) = Σ11 − Σ12 Σ−1

and, in consequence, the partial correlation between Xi and Xl can be expressed as

supp (Ω) = {(i, l) ∈ V 2 : i 6= l, ωi,l 6= 0}. (2.6)

We now introduce another parametrization of E, which we need to define and implement

with elements denoted by

A straightforward calculation shows that

= Σ11 + Σ12 Σ−1 −1 −1

= Σ11 − Σ12 Σ−1 −1

See Cramér (1999, Section 23.4).

{(i, l) ∈ V 2 : i 6= l, ψi,l 6= 0} (2.8)

2.3 The Stepwise Algorithm

Therefore, given a system of neighborhoods {Ai }pi=1 and l ∈

Graphical Stepwise Algorithm

Initialization. k = 0: set Ab01 = Ab02 = · · · = Ab0p = φ.

Forward. For each j = 1, ..., p do the following.

/ Abkj calculate the partial correlations fjlk as follows.

set Abk+1 bk bk+1 = Abk ∪ {j0 } , Abk+1 = Abk for l 6= j0 , l0

For each l ∈ Abk+1

set Abk+1 bk+1 bk+1 → Abk+1 \ {j0 }.

1. A collection of estimated neighborhoods Abj , j = 1, . . . , p.

3. An estimate of Ω, Ω ωil )pi,l=1 with ω

2.4 Thresholds selection by cross-validation

sample mean of the observations in the tth dataset of this variable.

●10 ●2 ●10 ●2 ●10 ●

True graph k=1 k=4

●10 ●2 ●10 ●2 ●10 ●

bk , for k = 1, 4, 9, 12, 16 of the GSA.

3 Numerical results and real data example

2. The Glasso estimate obtained by solving the `1 penalized-likelihood problem:

min −log{det[Ω]} + tr{ΩX> X} + λ k Ω k1 .

3. The CLIME estimate obtained by symmetrization of the solution of

min{k Ω k1 subject to |SΩ − I|∞ ≤ λ}, (3.2)

Leibler divergence defined by mN KL = DKL /(1 + DKL ) where

is the the Kullback-Leibler divergence between Ω

Model p GS Glasso CLIME

3.2 Analysis of Breast Cancer Data

the linear discriminant score as follows

rb(x) = argmax δr (x) for r = 1, 2. (3.5)

A.1 Selection of the thresholds parameters by cross-validation

b (t) (αf , αb ) = X b(t) βb(t)(t) ,

b (t) (αf , αb ) = (x̄(t) , . . . , x̄(t) )>

(t) (t) (t)

and compared with Xj (in blue color).

Cramér, H. (1999). Mathematical Methods of Statistics. Princeton University Press.

Dempster, A. P. (1972). Covariance selection. Biometrics, 157–175.

Eaton, M. L. (2007). Multivariate Statistics : A Vector Space Approach. Institute of

Edwards, D. (2000). Introduction to Graphical Modelling. Springer Science & Business

Hess, K. R., K. Anderson, W. F. Symmans, V. Valero, N. Ibrahim, J. A. Mejia, D. Booser,

Kalisch, M. and P. Bühlmann (2007). Estimating high-dimensional directed acyclic graphs

Lauritzen, S. L. (1996). Graphical Models. Oxford University Press.

Lawrance, A. J. (1976). On conditional and partial correlation. The American Statisti-

Ledoit, O. and M. Wolf (2004). A well-conditioned estimator for large-dimensional covari-

Matthews, B. (1975). Comparison of the predicted and observed secondary structure of t4

Ravikumar, P., M. J. Wainwright, G. Raskutti, B. Yu, et al. (2011). High-dimensional

Zhou, S., P. Rütimann, M. Xu, and P. Bühlmann (2011). High-dimensional covariance

You might also like