DGFT Spmag
DGFT Spmag
T
his article provides an overview of the current landscape of
signal processing (SP) on directed graphs (digraphs). Di-
rectionality is inherent to many real-world (information,
transportation, biological) networks, and it should play an in-
tegral role in processing and learning from network data. We
thus lay out a comprehensive review of recent advances in
SP on digraphs, offering insights through comparisons with
results available for undirected graphs, discussing emerging
directions, establishing links with related areas in machine
learning and causal inference in statistics as well as illus-
trating their practical relevance to timely applications. To
this end, we begin by surveying (orthonormal) signal rep-
resentations and their graph-frequency interpretations based
on novel measurements of signal variation for digraphs. We
then move on to filtering, a central component in deriving
a comprehensive theory of SP on digraphs. Indeed, through
the lens of filter-based generative signal models, we explore a
unified framework to study inverse problems (e.g., sampling
and deconvolution on networks), the statistical analysis of
random signals, and the topology inference of digraphs from
nodal observations.
A Motivating Starting Point: The Graph Fourier Transform for Undirected Graphs
Consider an undirected graph G with combinatorial ogy, define the total variation of the graph signal x with
Laplacian L = D - A chosen as the graph shift operator respect to the Laplacian L (also known as Dirichlet energy)
[3], where D stands for the diagonal degree matrix. as the following quadratic form:
The symmetrical L can always be decomposed as
L = Vdiag (m) V T , with V := [v 1, ..., v N ] collecting the ortho- TV2 (x) := x T Lx = | A ij (x i - x j) 2 . (S1)
i1j
normal eigenvectors of the Laplacian and m := [m 1, ..., m N ] T
its nonnegative eigenvalues. The graph Fourier transform The total variation TV2 (x) is a smoothness measure, quanti-
(GFT) of x with respect to L is the signal xu = [xu 1, ..., xu N ] T fying how much the signal x changes with respect to the
defined as xu = V T x. The inverse GFT (iGFT) of xu is given graph topology encoded in A.
by x = Vxu , which is a proper inverse by the orthogonality Back to the GFT, consider the total variation of the eigen-
of V. vectors v k, which is given by TV2 (v k) = v Tk Lv k = m k . It fol-
The iGFT formula x = Vxu = R kN= 1 xu k v k allows one to synthe- lows that the eigenvalues 0 = m 1 # m 2 # f # m N can be
size x as a sum of orthogonal-frequency components v k . viewed as graph frequencies, indicating how the eigenvec-
The contribution of v k to the signal x is the real-valued GFT tors (i.e., frequency components) vary over the graph G.
coefficient xu k . The GFT encodes a notion of signal variabili- Accordingly, the GFT and iGFT offer a decomposition of the
ty over the graph akin to the notion of frequency in the graph signal x into spectral components that characterize
Fourier analysis of temporal signals. To understand this anal- different levels of variability.
Graph Filters as Graph Signal Operators H _Gi: Linear Graph Filters With Nonsymmetrical S
x, G y, G L–1
H (h, S) : = / h l Sl
l=0
L–1
H _Gi
Hnv {hl} ( L–1
l=0
) / diag(hl)S l
,S : =
l=0
L–1
GS Input GS Output (
Hev {Hl}
L–1
l=0
) / (Hl
,S := & S )S l – 1
l=0
(a) (b)
FIGURE 1. (a) Graph filters as generic operators that transform a graph signal (GS) input into an output. The graph filter processes the features of the
input, taking into account the topology of the digraph where the signals are defined. (b) The different types of linear graph filters: a regular (shift-invariant)
graph filter H, a node-variant graph filter H nv , and an edge-variant graph filter H ev . The number of parameters (coefficients) is L, NL, and E L, respec-
tively. Due to their polynomial definition, all of these filters can operate over digraphs (nonsymmetric S). (c) Nonlinear graph signal operators using a
(potentially deep) NN with LN layers. Each layer consists of a parametrized graph-aware linear transformation (given, e.g., by any of the linear graph filters
described previously) followed by a pointwise nonlinearity [cf. (6)–(8)].
most widely used approach for (convolutional) GNNs being Inverse problems on digraphs
to replace Ti (,) " · G , with a graph filter. Precisely in inspir-
(,)
Inverse problems such as sampling and deconvolution have
ing this approach is where GSP insights and advances have played a central role in the development of GSP. Different mod-
been transformative because basic shift-invariance properties eling assumptions must be considered when addressing these
and convolution operations are otherwise not well defined for problems for digraphs; a good practice is to leverage the con-
graph signals [19]. cepts introduced in the “GSP Preliminaries,
Early contributions following the graph- As in standard NN Frequency Analysis, and Signal Represen-
filtering rationale have emerged from the architectures, GNN tations” and “Graph Filters and Nonlinear
machine learning community. The spectral Graph Signal Operators” sections and bal-
parameters (i.e., the filter
approach in [19] relies on the Laplacian ancing practical utility with mathematical
eigenvectors V and parametrizes the trans-
coefficients for each of tractability. For instance, parsimonious
f o r m a t i o n Ti (,) " · G , = Vdiag (i (,)) V T the layers) are learned
(,)
signal models based on graph smoothness
via gu = i (,), the filter’s frequency response using stochastic gradient or bandlimitedness are widely adopted. Al-
in (3) that is learned using backpropagation. descent. ternatively, observations can be modeled as
Although successful in many applications, the outputs of graph filters driven by white,
when dealing with digraphs these approaches suffer from the sparse, or piecewise constant inputs. This approach is particu-
same limitations as those discussed for their linear counter- larly useful in applications dealing with diffusion processes
parts. Moreover, scalability is often an issue due to the compu- defined over real-world networks with directional links. In this
tational burden associated with calculating the eigenvectors of section, we formally introduce a selection of prominent inverse
large (albeit sparse) graphs. Alternative architectures proposed problems, present established approaches for their solution,
replacing Ti (,) " z^, - 1h G , with (I - i (,) A) z (, - 1), where A is
(,)
and identify the main challenges when the signals at hand are
the (possibly nonsymmetric) adjacency matrix of the graph, and defined over digraphs.
(,)
i is a learnable scalar. To increase the number of parameters,
some authors have considered learning the nonzero entries of Sampling and reconstruction
A, assuming that its support is known. A more natural approach The sampling of graph signals and their subsequent recon-
is to replace Ti (,) " · G , with the polynomial filter in (4) and
(,)
struction have arguably been the most widely studied problems
consider the filter taps h = i (,) as the parameters to be learned within GSP [2, Ch. 9]. Broadly speaking, the objective is to
(see [18] and the references therein). Once again, implementing infer the value of the signal at every node from the observa-
(6)–(8) with H (,) = R lL=, -01 h l S l in lieu of Ti (,) " · G , exhibits a
(,) (,)
tions at a few nodes by leveraging the structure of the graph.
number of advantages because 1) the graph filter is always well To describe the problem formally, let us introduce the fat, bi-
defined (even for nondiagonalizable GSOs), 2) the degree of the nary, M × N sampling matrix C M and define the sampled sig-
filter controls the complexity of the architecture (the number of nal as xr = C M x. Notice that if M represents the subset of M
learnable parameters), and 3) the polynomial definition guaran- < N nodes where the signal is sampled, C M has exactly one
tees that the resultant graph filter can be implemented efficient- nonzero element per row, and the position of those nonzero
ly (via the successive application of sparse matrices), which is elements correspond to the indexes of the nodes in M. Then,
essential in scaling to large data sets. As in standard NN archi- the signal xr is indeed a selection of M out of the N elements
tectures, GNN parameters (i.e., the filter coefficients for each of x. This raises two fundamental questions: How can we re-
of the layers) are learned using stochastic gradient descent. For construct x from xr , and how can we design C M to facilitate
supervised learning tasks, the goal is to minimize a suitable loss this reconstruction?
function over a training set of (labeled) examples. The sparsity Starting with the first question, early works assumed
of S and the efficient implementation of polynomial graph fil- the graph to be undirected and the signal x to be band-
ters [compare (3)] are cardinal properties to keep the overall limited, i.e., to be a linear combination of just a few lead-
computational complexity in check. ing eigenvectors of the GSO. The GSO was typically set
Beyond convolutional GNNs, the aforementioned findings to the Laplacian L, with its eigenvectors V = [v 1, ..., v N ]
are also valid for recurrent GNNs. Furthermore, one can also being real valued and orthogonal. That is, the signal was
(,)
replace the graph filter H (,) = R lL=, -01 h l S l either with a set of assumed to be expressible as x = R kK= 1 xu k v k: = VK xu K , where
(,)
parallel filters, or with its node-variant H nv or edge-variant xu K ! R K collects the K active frequency coefficients, and
(,)
H ev counterparts. All of them preserve the distributed imple- VK is a submatrix of the GFT. Indeed, because the leading
Deconvolution h y x (a)
x h y CM y
(a)
x1 y1 h1 y1 x1 h1
x2 y2 x h2 y2 x2 h2
h + y
xP yP hP yP xP hP
(b)
FIGURE 2. A summary of the inverse problems introduced. In the schematic representation, graph signals are depicted as red circles and graph operators
as blue rectangles. Notice that filters are functions of the coefficients h and the GSO S [cf. (4)]. However, because S is assumed to be known for every
problem considered, we succinctly represent filters by their coefficients, h. The first four problems refer to the single-input, single-output scenario with
blind deconvolution being the most challenging because only (a sampled version of) the output is observed. Note that the problem frameworks in (b) can
be further extended to the case where the output is partially observed, as in (a). We omit this illustration to minimize redundancy and because these more
challenging problems are generally ill posed even in the case where y is fully observed.
Z * = argmin y - A (Z)
2
+ a1 Z + a2 Z filters (generating multiple outputs) was recently studied in [27],
2 * 2, 1 . (11)
Z thus providing a generalization of the classical blind multi-
channel identification problem in digital SP. Moreover, [26]
The nuclear-norm regularizer $ ) in (11) promotes a low- addresses the blind demixing case where a single observation
rank solution because we know that Z should be the outer formed by the sum of multiple outputs is available, and it is
product of the true variables of interest, that is, x and h. On assumed that these outputs are generated by different sparse
the other hand, the , 2, 1 mixed-norm Z 2, 1 = R iN= 1 z i 2 is the inputs diffused through different graph filters. This variation
sum of the , 2 norms of the rows of Z, thus promoting a row- of the problem is severely ill posed, and strong regularization
sparse structure in Z. This is aligned with a sparse input x forc- conditions should be assumed to ensure recovery, with the
ing rows of Z to be entirely zero from the outer product. After problem being easier if the graph filters are defined over dif-
solving for Z *, one may recover x and h from, e.g., a rank-one ferent GSOs {S i} iI = 1 . Figure 2 provides an overarching view
decomposition of Z * . of the problems mentioned in this section. See also [25] for
Extensions to multiple input, multiple-output (MIMO) pairs additional signal-recovery problems that can be written in the
(with a common filter) along with theoretical guarantees for encompassing framework of (9).
the case where the GSO S is normal (i.e., SS H = S H S ) can be As previously explained, a key feature that allows using
found in [5]. Interestingly, it was empirically observed and the- the problem formulations introduced in this section for sig-
oretically demonstrated that blind deconvolution in circulant nals defined on digraphs is that the generative graph filter
graphs (such as the directed cycle that represents the domain was incorporated in polynomial form. Unfortunately, many
of classic SP) corresponds to the most favorable setting. The of the theoretical guarantees for solving these problems rely
related case of a single graph signal as the input to multiple heavily on the spectral analysis of the GSO, thus assuming
challenging problem, even more so when the graph at hand is s.t. Ω = diag (~), S ii = 0, i = 1, f, N, (14)
directed. Early foundational contributions can be traced back
several decades to the statistical literature of graphical model where the , 1 -norm penalty promotes sparsity in the adjacen-
selection (see, e.g., [4, Ch. 7] and the opening of the “Statis- cy matrix. Both edge sparsity and endogenous inputs play a
tical Digraph SP” section). Discovering directional influence critical role in guaranteeing that the SEM parameters (13) are
among variables is at the heart of causal inference, and iden- uniquely identifiable (see also [28]). Acknowledging the limi-
tifying the cause-and-effect digraphs (so-termed structural tations of linear models, [29] leverages kernels within the SEM
causal models) from observational data are a notoriously dif- framework to model nonlinear pairwise dependencies among
ficult problem [6, Ch. 7 and 10]. network nodes (see the “Applications” section for results on the
Recently, the fresh modeling and signal representation per- identification of gene-regulatory networks).
spectives offered by GSP have sparked renewed interest in Although SEMs capture only contemporaneous relation-
the field. Initial efforts have focused mostly on learning undi- ships among the nodal variables (i.e., SEMs are memoryless),
70
40
(a)
80 Recovered Signal
Energy
300 0.98
Signal Value
250 60
DGFT
0.96
200
150 0.94 40
0 10 20 30 40 50
100
50 ; ~x; ; ~y; 20
0 0
0 10 20 30 40 50 5 10 15 20 25 30 35 40 45
Index Node
(b) (c)
FIGURE 3. Temperature denoising using the DGFT [14]. (a) The graph signal of average annual temperature in Fahrenheit for the contiguous United States.
In the depicted digraph, a directed edge joins two states if they share a border, and the edge directions go from south to north. (b) The DGFT of the origi-
nal signal ^ xu h and the noisy signal ^ yu h along with their cumulative energy distribution across frequencies. (c) A sample realization of the true, noisy, and
recovered temperature signals for a filter bandwidth K = 3.
can
for
if
0.2
in
by
it
t
lik
bu
e
m
mo
at
ay
r e as
mu d
st an 0.15
no an
Test Error
of all
on a
yet 0.1
one
or wou
ld
r wit
ou h
all wi 0.05
ll
sh
w en
.
so
ax
ed
ax
ed
hi
eL
wh
m
m
m
t
ch
tha
h
h
h
the
wh
then
upo
they
2
1
2
this
to
at
Activation Function
n
(a) (b)
FIGURE 4. Identifying the author of a text using GNNs [34]. (a) An example of a WAN with 40 function words as nodes built from the play “The
umorous Lieutenant” by John Fletcher. The radius of the nodes is proportional to the word count, and the darker the edge color the higher the
H
edge weight. Directionality has been ignored for ease of representation, but the GNNs are defined on the directed WAN. (b) The authorship
attribution test error in GNN architectures with localized activation functions for the classification of Emily Brontë versus her contemporaries.
Max.: maximum; med.: median.
HLA-DQA2 HLA-DRE
IL4I1 HLA-DPA1
IL24 HLA-DPA1
HLA-DOA2 IGHV3-11
IL4I1 IGHV3-11
FCRLA HLA-DRB4
FIGURE 6. Inferring (directed) gene-regulatory networks from expression data [29]. The networks are inferred following the SEM formulation in (14) for (a)
a linear kernel, (b) a polynomial kernel of order two, and (c) a Gaussian kernel with unit variance.
range of signal-recovery problems was selectively covered, years, and part of that success can be extended to our more
focusing on inverse problems in digraphs, including sampling, challenging domain. Equally interesting is that the use of deep
deconvolution, and system identification. A statistical view- learning to generate the graphs themselves (as opposed to the
point for signal modeling was also discussed by extending the graph signals) has recently gained traction so that, as discussed
definition of weak stationarity of random graph processes to in this article, one can conceive NN architectures that learn
the directed domain. The last stop was to review recent results (and even generate) digraphs from training graph signals while
that applied the tools surveyed in this article to the problem of encoding desirable topological features. One final direction of
learning the topology of a digraph from nodal observations, future research is the extension of the concepts discussed in this
an approach that can lead to meaningful connections between article to the case of higher-order directed relational structures.
GSP and the field of causal inference in statistics. A common The generalization of GSP to hypergraphs through tensor mod-
theme in the extension of established GSP concepts to the less- els and simplicial complexes has been explored in recent years,
explored realm of digraphs is that the definitions and notions but their analysis in directed scenarios is almost uncharted
that rely heavily on spectral properties are challenging to gen- research territory.
eralize, whereas those that can be explicitly postulated in the
vertex domain are more amenable to be extended to digraphs. Acknowledgments
A diverse gamut of potential research avenues naturally fol- The work presented in this article was supported by the Span-
lows from the developments presented. Efficient approaches ish Federal Grants Klinilycs and SPGraph (TEC2016-75361-R
for the computation of the multiple GFTs for digraphs (akin and PID2019-105032GB-I00) and NSF Awards CCF-1750428
to the fast Fourier transform in classical SP) would facilitate and ECCS-1809356. Antonio G. Marques is the correspond-
the adoption of this methodology in large-scale settings. The ing author.
incorporation of nonlinear (median, Volterra, and NNs) graph
signal operators as generative models for the solution of inverse Authors
problems is another broad area of promising research. Deep Antonio G. Marques ([email protected])
generative models for signals defined in regular domains (such received his five-year diploma and doctorate degree in tele-
as images) have shown remarkable success over the last several communications engineering (both with highest honors) from
[2] P. Djuric and C. Richard, Cooperative and Graph Signal Processing: [27] Y. Zhu, F. J. Iglesias, A. G. Marques, and S. Segarra, “Estimation of network
Principles and Applications. San Diego, CA: Academic Press, 2018. processes via blind graph multi-filter identification,” in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Processing (ICASSP), May 2019, pp. 5451–5455.
[3] D. Shuman, S. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The doi: 10.1109/ICASSP.2019.8683844.
emerging field of signal processing on graphs: Extending high-dimensional data
analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. [28] G. B. Giannakis, Y. Shen, and G. V. Karanikolas, “Topology identification and
30, no. 3, pp. 83–98, 2013. doi: 10.1109/MSP.2012.2235192. learning over graphs: Accounting for nonlinearities and dynamics,” Proc. IEEE, vol.
106, no. 5, pp. 787–807, 2018. doi: 10.1109/JPROC.2018.2804318.
[4] E. D. Kolaczyk, Statistical Analysis of Network Data: Methods and Models.
New York: Springer-Verlag, 2009. [29] Y. Shen, B. Baingana, and G. B. Giannakis, “Kernel-based structural equation
models for topology identification of directed networks,” IEEE Trans. Signal
[5] S. Segarra, G. Mateos, A. G. Marques, and A. Ribeiro, “Blind identification of Process., vol. 65, no. 10, pp. 2503–2516, 2017. doi: 10.1109/TSP.2017.2664039.
graph filters,” IEEE Trans. Signal Process., vol. 65, no. 5, pp. 1146–1159, 2017. doi:
10.1109/TSP.2016.2628343. [30] A. Bolstad, B. D. V. Veen, and R. Nowak, “Causal network inference via group
sparse regularization,” IEEE Trans. Signal Process., vol. 59, no. 6, pp. 2628–2641,
[6] J. Peters, D. Janzing, and B. Schölkopf, Elements of Causal Inference: 2011. doi: 10.1109/TSP.2011.2129515.
Foundations and Learning Algorithms. Cambridge, MA: MIT Press, 2017.
[31] J. Mei and J. M. F. Moura, “Signal processing on graphs: Causal modeling of
[7] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” unstructured data,” IEEE Trans. Signal Process., vol. 65, no. 8, pp. 2077–2092,
IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656, 2013. doi: 10.1109/ 2017. doi: 10.1109/TSP.2016.2634543.
TSP.2013.2238935.
[32] R. Shafipour, S. Segarra, A. G. Marques, and G. Mateos, “Directed network
[8] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: topology inference via graph filter identification,” in Proc. IEEE Data Science
Frequency analysis,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3042–3054, Workshop (DSW), June 2018, pp. 210–214. doi: 10.1109/DSW.2018.8439888.
2014. doi: 10.1109/TSP.2014.2321121.
[33] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology
[9] H. Sevi, G. Rilling, and P. Borgnat, “Harmonic analysis on directed graphs and inference from spectral templates,” IEEE Trans. Signal Inf. Process. Netw., vol. 3,
applications: From Fourier analysis to wavelets,” 2018, arXiv:1811.11636v2. no. 3, pp. 467–483, 2017. doi: 10.1109/TSIPN.2017.2731051.
[10] F. Chung, “Laplacians and the Cheeger inequality for directed graphs,” Ann. [34] L. Ruiz, F. Gama, A. G. Marques, and A. Ribeiro, “Invariance-preserving
Comb., vol. 9, no. 1, pp. 1–19, 2005. doi: 10.1007/s00026-005-0237-z. localized activation functions for graph neural networks,” IEEE Trans. Signal
[11] J. A. Deri and J. M. F. Moura, “Spectral projector-based graph Fourier trans- Process., vol. 68, pp. 127–141, Nov. 2019. doi: 10.1109/TSP.2019.2955832.
forms,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 6, pp. 785–795, 2017. doi:
10.1109/JSTSP.2017.2731599. SP