0% found this document useful (0 votes)
4 views

DGFT Spmag

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

DGFT Spmag

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

GRAPH SIGNAL PROCESSING:

FOUNDATIONS AND EMERGING DIRECTIONS

Antonio G. Marques, Santiago Segarra,


and Gonzalo Mateos

Signal Processing on Directed Graphs


The role of edge directionality when processing
and learning from network data

T
his article provides an overview of the current landscape of
signal processing (SP) on directed graphs (digraphs). Di-
rectionality is inherent to many real-world (information,
transportation, biological) networks, and it should play an in-
tegral role in processing and learning from network data. We
thus lay out a comprehensive review of recent advances in
SP on digraphs, offering insights through comparisons with
results available for undirected graphs, discussing emerging
directions, establishing links with related areas in machine
learning and causal inference in statistics as well as illus-
trating their practical relevance to timely applications. To
this end, we begin by surveying (orthonormal) signal rep-
resentations and their graph-frequency interpretations based
on novel measurements of signal variation for digraphs. We
then move on to filtering, a central component in deriving
a comprehensive theory of SP on digraphs. Indeed, through
the lens of filter-based generative signal models, we explore a
unified framework to study inverse problems (e.g., sampling
and deconvolution on networks), the statistical analysis of
random signals, and the topology inference of digraphs from
nodal observations.

Introduction and motivation


Coping with the panoply of challenges found at the conflu-
©ISTOCKPHOTO.COM/ALISEFOX
ence of data and network sciences necessitates fundamental
breakthroughs in the modeling, identification, and controlla-
bility of networked (complex) system processes—often con-
ceptualized as signals defined on graphs [1]. Graph-supported
signals abound in real-world applications, including vehicle-
congestion levels over road networks, neurological activity
signals supported on brain-connectivity networks, and fake
news that diffuse on online social networks. There is, how-
ever, an evident mismatch between our scientific understand-
ing of signals defined over regular domains such as time
or space and graph signals, due, in part, to the fact that the
prevalence of network-related problems and access to quality
network data are recent events. To address these problems,
Digital Object Identifier 10.1109/MSP.2020.3014597
Date of current version: 28 October 2020 machine learning and SP over graphs have emerged as active

1053-5888/20©2020IEEE IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 99


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
areas aimed at making sense of large-scale data sets from a advantages [3], vertex-domain generative graph signal models
network-centric perspective. that rely on nonsymmetric network operators may be prefer-
Upon modeling the domain of the information as a graph able when it comes to signal and information processing on
and the observations at hand as graph signals, the graph directed networks.
SP (GSP) body of work has put forth models that relate the
properties of the signals with those of the graph, along with GSP preliminaries, frequency analysis,
algorithms that fruitfully leverage this relational structure to and signal representations
better process and learn from network data. Most GSP efforts After introducing the necessary graph-theoretic notation and
to date assume that the underlying networks are undirected background, this section presents different generalizations of
[2]. Said graphs are equivalently represented by symmetri- smoothness and total-variation measurements for signals de-
cal matrices whose (well-behaved) spectral properties can fined on digraphs. This is particularly relevant to the graph
be used to process the signals associated with the network. Fourier transform (GFT), which decomposes a graph signal
The most prominent example is the graph into components that describe different
Laplacian, which not only gives rise to a Most GSP efforts to date modes of variation with respect to the graph
natural definition of signal smoothness but topology. Although adopting the real-val-
also offers a complete set of orthonormal
assume that the ued orthonormal eigenvectors of the Lapla-
eigenvectors that serve as a Fourier-type underlying networks cian as the frequency basis for undirected
basis for graph signals [3]. are undirected. graphs is well motivated and widely used in
Their scarcer adoption notwithstanding, practice [1], extending the GFT framework
digraph models are more adequate (and, in fact, more accu- to digraphs is not a simple pursuit and different alternatives
rate) for a number of applications. Information networks such exist, as we explain in the “Digraph Fourier Transforms: Spec-
as scientific citations or the World Wide Web itself are typi- tral Methods” and “Digraph Fourier Transforms: Orthonormal
cally directed, and flows in technological (e.g., transportation, Transform Learning” sections.
power, and communication) networks are often unidirectional.
The presence of directionality plays a critical role when the Graph signals and the graph shift operator
measurements taken in those networks need to be processed Let G denote a digraph with a set of nodes N (with cardinal-
to remove noise, outliers, and artifacts, and this requires new ity N) and a set of links E. If i is connected to j, then (i, j) ! E.
tools and algorithms that do not assume that the matrices rep- Because G is directed, local connectivity is captured by the set
resenting the underlying graphs are symmetrical. N i := {j | (j, i) ! E}, which stands for the (incoming) neigh-
Gene-regulatory networks are highly nonreciprocal, and borhood of i. For any given G, we define the adjacency matrix
this lack of reciprocity needs to be accounted for when, for A ! R N # N as a sparse matrix with nonzero elements Aji if and
example, the goal is to predict a gene or a protein function- only if (i, j) ! E. The value of Aji captures the strength of the
ality from a small set of observations obtained from expen- connection from i to j and, because the graph is directed, the
sive experiments. Pairwise relations among social actors are matrix A is, in general, nonsymmetric.
rarely purely symmetrical [4] and, in fact, when the graph The focus of the article is on analyzing and modeling
captures some level of influence on a social network, the lack (graph) signals defined on the node set N. These signals can
of symmetry is essential to accurately solve inverse problems be represented as vectors x = [x 1, ..., x N ] T ! R N , with xi being
that aim to separate the leaders from the followers [5]. More the value of the signal at node i. As the vectorial representation
abstractly, when the graph encodes (often unknown) relations does not explicitly account for the structure of the graph, G
between observed variables, directionality is vital to identify can be endowed with the so-called graph shift operator (GSO)
the nodes representing the cause and those representing the S [7], [8]. The shift S ! R N # N is a matrix whose entry Sji can
effect [6], calling for fundamental changes in the algorithms be nonzero only if i = j or if (i, j) ! E. The sparsity pattern of
that use available signal observations to learn the topology of the matrix S captures the local structure of G, but we make
the underlying graph. Accordingly, a first step to address these no specific assumptions on the values of its nonzero entries,
and other related questions is to develop judicious models that which will depend on the application at hand [1].
account for directionality while leading to tractable processing To justify the adopted graph shift terminology, consider the
tools and efficient algorithms. That is precisely the goal of this directed cycle graph whose circulant adjacency matrix A dc is
tutorial article, which aims at delineating the analytical back- zero, except for entries A ji = 1 whenever i = mod N ( j) + 1,
ground and relevance of innovative tools to analyze and pro- where mod N (x) denotes the modulus (remainder) obtained
cess signals defined over digraphs. Throughout, concepts will after dividing x by N. Such a graph can be used to represent
be made accessible to SP researchers (including those with- the domain of discrete-time periodic signals with period N. If
out a strong background in network science) via a combina- S = A dc, then Sx implements a circular shift of the entries in
tion of rigorous problem formulations and intuitive reasoning. x, which corresponds to a one-unit time delay under the afore-
A recurrent message with important practical ramifications mentioned interpretation [1]. Note though that, in general, S
interweaves the narrative—different from the undirected case, need neither be invertible nor isometric, an important depar-
where graph spectrum-based tools offer a number of distinct ture from the shift in discrete-time SP. The intuition behind

100 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
S is to represent a linear transformation that can be computed yielding parsimonious spectral representations of network pro-
locally at the nodes of the graph, while it can be more gen- cesses remains.
eral than the adjacency matrix. More rigorously, if the graph The Laplacian L = D - A is not well defined for digraphs
signal y is defined as y = Sx , then node i can compute yi as a because D is rendered meaningless when edges have direction-
linear combination of the signal values xj at node i’s neighbors ality. One can instead consider a generic asymmetric GSO S,
j ! N i . The GSO will play a fundamental role in defining for instance, the adjacency matrix A or one of the several gener-
the counterpart of the Fourier transform for alized Laplacians for digraphs; see, e.g., [9]
graph signals, which is discussed in this sec- and [10]. Suppose the GSO is diagonalizable
tion, as well as graph filters that are intro-
Vertex-domain generative as S = Vdiag (m) V -1, with V := [v 1, ..., v N ]
duced in the “Graph Filters and Nonlinear graph signal models that denoting the (nonorthogonal) eigenvectors
Graph Signal Operators” section. rely on nonsymmetric of S and m := [m 1, ..., m N ] T its possibly com-
network operators may be plex-valued eigenvalues. Then, a widely
Digraph fourier transforms: preferable when it comes adopted alternative is to redefine the GFT
Spectral methods to signal and information as xu = V -1 x [8]. Otherwise, one can resort
An instrumental GSP tool is the GFT, which to the Jordan decomposition of S and use
decomposes a graph signal into orthonor-
processing on directed its generalized eigenvectors as the GFT
mal components describing different modes networks. basis; see also [11] for a careful treatment
of variation with respect to the graph topol- of the nondiagonalizable case, which relies
ogy encoded in an application-dictated GSO S. The GFT al- on oblique spectral projectors to define the GFT. Setting the
lows equivalently representing a graph signal in two different GFT to V -1 for the directed case is an intuitively pleasing
domains—the vertex domain consisting of the nodes in N definition because frequency shift operator, as in discrete-
and the graph-frequency domain spanned by the spectral basis time SP. Moreover, allowing for generic GSOs reveals the
of G. Therefore, signals can be manipulated in the frequency encompassing nature of the GFT relative to the time-domain
domain for the purpose of, e.g., denoising, compression, and discrete Fourier transform (DFT), the multidimensional DFT,
feature extraction (see also the “Graph Filters and Nonlinear and principal component analysis [12]. Toward interpreting
Graph Signal Operators” section). For didactic purposes, it is graph frequencies, which are defined by the (possibly com-
informative to introduce first the GFT for symmetrical graph plex-valued, nonorthogonal) eigenvectors of the nonsymmet-
Laplacians associated with undirected graphs (see “A Moti- ric S, consider the total-variation measure
vating Starting Point: The Graph Fourier Transform for Un-
directed Graphs”). In the remainder of this section, we show TV1 (x) := x - Sx r 1, (1)
that the GFT can be defined for digraphs where the interpreta-
tion of components as different modes of variability is not as where S = S/ | m max | , and m max is the spectral radius of S [com-
clean and, Parseval’s identity may not hold, but its value toward pare (S1)]. Using (1) and following the rationale for undirected

A Motivating Starting Point: The Graph Fourier Transform for Undirected Graphs
Consider an undirected graph G with combinatorial ogy, define the total variation of the graph signal x with
Laplacian L = D - A chosen as the graph shift operator respect to the Laplacian L (also known as Dirichlet energy)
[3], where D stands for the diagonal degree matrix. as the following quadratic form:
The symmetrical L can always be decomposed as
L = Vdiag (m) V T , with V := [v 1, ..., v N ] collecting the ortho- TV2 (x) := x T Lx = | A ij (x i - x j) 2 . (S1)
i1j
normal eigenvectors of the Laplacian and m := [m 1, ..., m N ] T
its nonnegative eigenvalues. The graph Fourier transform The total variation TV2 (x) is a smoothness measure, quanti-
(GFT) of x with respect to L is the signal xu = [xu 1, ..., xu N ] T fying how much the signal x changes with respect to the
defined as xu = V T x. The inverse GFT (iGFT) of xu is given graph topology encoded in A.
by x = Vxu , which is a proper inverse by the orthogonality Back to the GFT, consider the total variation of the eigen-
of V. vectors v k, which is given by TV2 (v k) = v Tk Lv k = m k . It fol-
The iGFT formula x = Vxu = R kN= 1 xu k v k allows one to synthe- lows that the eigenvalues 0 = m 1 # m 2 # f # m N can be
size x as a sum of orthogonal-frequency components v k . viewed as graph frequencies, indicating how the eigenvec-
The contribution of v k to the signal x is the real-valued GFT tors (i.e., frequency components) vary over the graph G.
coefficient xu k . The GFT encodes a notion of signal variabili- Accordingly, the GFT and iGFT offer a decomposition of the
ty over the graph akin to the notion of frequency in the graph signal x into spectral components that characterize
Fourier analysis of temporal signals. To understand this anal- different levels of variability.

IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 101


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
graphs, one can define a frequency ordering m i ( m j if basis in [13] tends to be constant across clusters of the graph,
TV1 (v i) 2 TV1 (v j) [8]. Although applicable to signals on di- offering parsimonious spectral representations of signals that
graphs, unlike (S1), the signal-variation measure (1) does not are real valued and piecewise-constant over said clusters. The
ensure that constant signals have zero variation. In addition, price paid for all of these desirable properties is that the result-
(generalized) eigenvectors of asymmetric GSOs need not be ing GFT basis may fail to yield atoms capturing different lev-
orthonormal, implying that Parseval’s identity will not hold and els of signal variation with respect to G, and the optimization
hence, the signal power is not preserved across the vertex and procedure in [13] is computationally expensive due to repeated
dual domains. In general, this can be an issue for graph-filter- singular-value decompositions.
ing methods operating in the spectral domain, thus motivating A related (optimization-based) approach in [14] searches
this article’s overarching theme of relying on vertex-domain for an orthonormal digraph Fourier transform (DGFT)
operations for extensions to digraphs. From a computational basis U := [u 1, f, u N ] ! R N # N , where u k ! R N represents
standpoint, obtaining the Jordan decomposition for moderate- the kth frequency component. Toward defining frequencies,
sized graphs is expensive and often numerically unstable; see a more general notion of signal-directed variation (DV) for
also [11] and the references therein for recent attempts toward digraphs is introduced as DV (x) := R i ! j A ji [x i - x j] 2+, where
mitigating this instability issue. Addressing the uniqueness [x] + := max (0, x) denotes projection onto the nonnegative
of the representation is also critical when the GSO (even the reals. To gain insight on DV, consider a graph signal x on the
combinatorial Laplacian) has repeated eigenvalues because the digraph G, and suppose a directed edge represents the direc-
corresponding eigenspaces exhibit rotation- tion of signal flow from a larger value to a
al ambiguities that can hinder the interpret- Orthonormal linear smaller one. Thus, an edge from node i to
ability of graph-frequency analyses. To ad- node j (i.e., Aji > 0) contributes to DV (x)
dress this (often overlooked shortcoming),
transformations excel only if xi > xj. Moreover, notice that if G is
[11] puts forth a quasi-coordinate-free GFT at separating signals undirected, then DV (x) / TV2 (x) . Analo-
definition based on oblique spectral projec- from noise. gous to the GFTs surveyed in the “Digraph
tors. Other noteworthy GFT approaches Fourier Transforms: Spectral Methods”
rely on projections onto the (nonorthogonal) eigenvectors of a section, we define the frequency fk := DV (u k) as the DV of
judicious random-walk operator on the digraph [9], [10]; the the frequency component u k . Because for all previous GFT
interested reader is referred to [9, Sec. 7] for a collection of approaches the spacing between frequencies can be highly
examples involving semisupervised learning and signal mod- irregular, the idea in [14] to better capture low, bandpass, and
eling on digraphs. high frequencies is to design a DGFT such that the orthonor-
Alternatives to the spectral GFT methods described thus far mal frequency components are as spread as possible in the
are surveyed in the following section. The focus shifts to ortho- graph-spectral domain. Beyond offering parsimonious repre-
normal transform learning approaches, whereby optimization sentations of slowly varying signals on digraphs, a DGFT with
problems are formulated to find a suitable spectral representa- spread frequency components can facilitate more interpretable
tion basis for graph signals. frequency analyses and aid filter design in the spectral domain.
To this end, a viable approach is to minimize a so-termed spec-
Digraph Fourier transforms: Orthonormal tral-dispersion criterion
transform learning
N-1
The history of SP has repeatedly taught us how low frequen- U * = argmin / 6DV (u i + 1) - DV (u i)@2
cies are more meaningful in human speech for the purpose U i=1

of compression, high frequencies represent borders in images
s.t. U T U = I N , u 1 = 1 N , u N = argmax DV (u) . (2)
whose identification is key for segmentation, and different N || u | | = 1
principal components offer varying discriminative powers
when it comes to face recognition. Although analogous in- The cost function measures how well spread the correspond-
terpretations are not always possible in more advanced repre- ing frequencies are over [0, DV (u N )] . Having fixed the first
sentations obtained with modern tools such as learned over- and last columns of U, the dispersion function is minimized
complete dictionaries and neural networks (NNs), at a basic when the free DV values are selected to form an arithme-
level, it remains true that orthonormal linear transformations tic sequence over the attainable bandwidth. However, as the
excel at separating signals from noise. Motivated by this gen- variables here are the columns of U, we can expect to obtain
eral signal representation principle, a fresh look at the GFT only approximately equidistributed frequencies. Finding the
for digraphs was put forth in [13] based on the minimization global optimum of (2) is challenging due to the noncon-
of the convex Lovász extension of the graph cut size (which vexity arising from the orthonormality (Stiefel manifold)
can be interpreted as a measurement of signal variation on constraints, yet a stationary point can be provably obtained
the graph capturing the edges’ directionality), subject to or- via the algorithm in [14]. Accordingly, the basis U * in (2)
thonormality constraints on the desired basis. The rationale and its counterpart in [13] may not be unique. In the “Ap-
behind the graph cut criterion is that its minimization leads plications” section, we illustrate a graph signal-denoising
to identifying clusters in G. Accordingly, the learned GFT task whereby the DGFT basis learned from (2) is used to

102 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
decompose and then (low- pass) filter temperatures recorded operation in the spectral domain [3]. Specifically, if x denotes
across the United States. the input of the graph filter and y is its output, filtering a graph
signal is tantamount to transforming the input signal to the
Graph filters and nonlinear graph signal operators graph Fourier domain as xu = V T x, applying a pointwise (di-
Here we consider operators whose inputs and outputs are agonal) operator in the spectral domain to generate the out-
signals defined on a digraph [see Figure 1(a) for a pictorial put yu , and finally, transforming the obtained output back onto
representation]. These operators are not only used to process the vertex domain as y = Vyu . The pointwise spectral operator
information defined on digraphs (see also the “Applications” can be expressed as the multiplication by a diagonal matrix
section), but also to postulate generative signal models for diag (gu ) so that yu = diag (gu ) xu . Alternatively, one can adopt a
network data and solve statistical inference tasks surveyed scalar kernel function g : R " R applied to the eigenvalues of
in the “Inverse Problems on Digraphs,” “Statistical Digraph the Laplacian so that the frequency response of the filter can
SP,” and “Digraph Topology Inference” sections. A key as- be obtained as diag (gu ) = diag (g (m)), where g (·) is applied en-
pect throughout the discussion is how the topology of the trywise. Regardless of the particular choice, the input–output
digraph impacts the transformation of signals. The section relation can be written as
begins by discussing linear graph filters [2, Ch. 11] and then
builds on those to describe nonlinear (deep) architectures. y = Vdiag (g (m)) V T x = Vdiag (gu ) V T x, (3)
After a brief outline of the current filtering landscape for un-
directed graphs, we focus on recent progress to tackle the with the N × N matrix Vdiag (g (m)) V T representing the linear
challenges faced when extending those operators to the di- transformation in the nodal domain. An alternative definition
rected case. consists of leveraging the interpretation of S as a reference
graph signal operator and then building more general linear
Linear graph filters operators of the form [7]
Several definitions for graph filters coexist in the GSP litera-
ture. Early works focused on using the graph Laplacian L as the
GSO and leveraged its eigendecomposition L = Vdiag (m) V T y = h 0 x + h 1 Sx + g + h L - 1 S L - 1 x
L-1
(see “A Motivating Starting Point: The Graph Fourier Trans- := Hx, with H := / h l S l, (4)
form for Undirected Graphs”) to define the graph-filtering l=0

Graph Filters as Graph Signal Operators H _Gi: Linear Graph Filters With Nonsymmetrical S

x, G y, G L–1
H (h, S) : = / h l Sl
l=0
L–1
H _Gi
Hnv {hl} ( L–1
l=0
) / diag(hl)S l
,S : =
l=0
L–1

GS Input GS Output (
Hev {Hl}
L–1
l=0
) / (Hl
,S := & S )S l – 1
l=0
(a) (b)

H _Gi: Nonlinear Graph NN Architecture

x z(0) z(1) z(LN – 1) z(LN) y


T (1)
(1) _Gi v(1) T (L(LN) ) _Gi v(LN)
i G i N
G

Graph-Aware Pointwise Graph-Aware Parametric


GS Input Nonlinearity Linear Transformation GS Output
(c)

FIGURE 1. (a) Graph filters as generic operators that transform a graph signal (GS) input into an output. The graph filter processes the features of the
input, taking into account the topology of the digraph where the signals are defined. (b) The different types of linear graph filters: a regular (shift-invariant)
graph filter H, a node-variant graph filter H nv , and an edge-variant graph filter H ev . The number of parameters (coefficients) is L, NL, and E L, respec-
tively. Due to their polynomial definition, all of these filters can operate over digraphs (nonsymmetric S). (c) Nonlinear graph signal operators using a
(potentially deep) NN with LN layers. Each layer consists of a parametrized graph-aware linear transformation (given, e.g., by any of the linear graph filters
described previously) followed by a pointwise nonlinearity [cf. (6)–(8)].

IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 103


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
where the filter coefficients are collected in h := [h 0, f, h L - 1] T , node- [15] and edge-variant graph filters [16], whose respective
with L – 1 denoting the filter degree. Upon defining x (l + 1) := Sx (l) expressions are given by
and x (0) = x, the output y in (4) can be equivalently written as L-1 L-1
y = R lL=-01 h l x (l) . As the application of the GSO S requires only H nv := / diag (h l) S l and H ev := / (H l % S) S l - 1, (5)
l=0 l=0
local exchanges among (one-hop) neighbors, the latter expres-
sion reveals that the operators in (4) can be implemented in a where º denotes the Hadamard product, h l is a vector of di-
distributed fashion with L – 1 successive exchanges of infor- mension N, and H l is a sparse matrix with the same support
mation among neighbors [15]. This is a key insight (and ad- as S. Compared with its (node-invariant) counterpart in (4), we
vantage) of (4) that will be leveraged in subsequent sections. observe that the output generated by a node-variant filter can
Note that the coefficients h can be given (e.g., when modeling also be viewed as a linear combination of locally shifted in-
known network-diffusion dynamics) or designed to accom- puts x (l) = S l x, but in this case, each node has the flexibility
plish a particular SP task, such as low-pass filtering (see, e.g., of using a different set of weights. The flexibility is even larger
[15] for further details on graph-filter designs). for edge-variant graph filters because nodes can change the
When L = N and G is symmetrical so that the GSO is weight they give to each of their neighbors (compare all of their
guaranteed to be diagonalizable, the two previous definitions neighbors for node-variant filters). Because both H nv and H ev
can be rendered equivalent. But this is not build on a polynomial definition, they can
the case when the graph filter is defined on Because both Hnv and Hev seamlessly operate over digraphs. They thus
a digraph. To see why, note that the polyno- inherit most of the properties described for
build on a polynomial
mial definition in (4) is valid regardless of the original polynomial graph filters in (4).
whether the GSO is symmetrical or not. Its
definition, they can
interpretation as a local operator also holds seamlessly operate Graph NN architectures
true for digraphs provided that the notion over digraphs. Graph filters have also been used to de-
of locality is understood, in this case, con- fine nonlinear operators that account for
sidering only the neighbors with incoming connections. The the topology of the graph, such as median filters [17] and
generalization of the definition in (3) to the directed case is, Volterra graph filters. All of these works build their defini-
however, more intricate. tions from the polynomial expression in (4) and hence can
As explained in the “GSP Preliminaries, Frequency Analy- handle digraphs, although some of their properties (e.g.,
sis, and Signal Representations” section, different GFTs for the conditions that a signal needs to satisfy to be a root of
digraphs exist. If the iGFT is given by the eigenvectors of a median graph filter [17]) require minor modifications. A
the GSO, then one need only replace V T with the (nonor- case of particular interest is that of deep graph NN (GNN)
thogonal) V -1 in (3). If the GSO is diagonalizable and V -1 architectures [18], which have attracted significant attention
is adopted as the GFT, then the polynomial definition in (4) in recent years for tackling machine learning problems in-
and the updated version of (3) are equivalent. If the GSO is volving network data. Traditional (e.g., convolutional) NNs
not diagonalizable, the generalization of (3) is unclear, while have been remarkably successful in tasks involving images,
(4) still holds. On the other hand, if the GFT is not chosen video, and speech, all of which represent data with an un-
to be V -1 but is one of the orthogonal (graph-smoothness- derlying Euclidean domain that is regularly sampled over a
related) dictionaries U presented in the “Digraph Fourier grid-like structure. However, said structure one almost takes
Transforms: Orthonormal Transform Learning” section, then for granted is missing when it comes to signals defined on
the two definitions diverge. Specifically, linear operators of graphs. As argued next, GSP offers an ideal framework to
the form Udiag (gu ) U T will be symmetrical (meaning that the fill in this fundamental gap.
influence of the input at node i on the output at node j will be The overall idea in GNN architectures is to define an
the same as that of node j on node i), while operators of the input–output relation by using a concatenation of L N lay-
form R lL=-01 h l S l will not. Equally important is that, although ers composed of a linear transformation that combines the
a polynomial filter can always be implemented using local different signal values and a scalar (pointwise) nonlinear
exchanges, there is no guarantee that the symmetrical trans- function that increases the expressiveness of the mapping.
formation Udiag (gu ) U T can be implemented in a distributed Mathematically, with x and y denoting the input and the
fashion [15]. All in all, if the definition in (4) is adopted for the output to the overall NN architecture and , being the layer
directed case, then graph filters are always well defined, their index, we have that
distributed implementation is still feasible, and the design and
interpretation of the filter coefficients h as weights given to z (0) = x, and y = z (L N), where (6)
zt (,) = Ti (,) " z (, - 1) G ,, 1 # , # L N , (7)
the information obtained after successive local exchanges is (,)
preserved. Their interpretation as diagonal spectral operators
(, + 1) (,)
only holds, however, if V -1 is used as a GFT and the GSO at z ij = v G ([zt (,)] ij), 1 # , # L N , and all i, j ! N. (8)
hand is diagonalizable.
Generalizations of graph filters were introduced within the In (6)–(8), z (,) is the output of layer , and serves as input to lay-
er , + 1. The transformation Ti (,) " · G , is the linear o­ perator
(,)
class of linear graph-aware signal operators. These include

104 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
implemented at layer ,, i^ , h are the learnable parameters that mentation of (4) while increasing the number of learnable
(,)
define such a transformation, and v G : R " R is a scalar non- parameters. As a result, the use of polynomial-based graph
linear operator (possibly different per layer). When applied to filter definitions that operate directly in the nodal domain
graph signals, the NN architecture in (6)–(8) must account to design NN architectures for digraphs opens a number of
for the topology of the graph and, for that reason, the depen- research avenues for deep learning over digraphs (see, e.g.,
dence of both the linear and nonlinear operators on G was [18]–[20] as well as other relevant articles in this special issue
made explicit. In most works, the role of the graph is consid- for additional details).
ered when defining the linear operator in Ti (,) " · G , with the
(,)

most widely used approach for (convolutional) GNNs being Inverse problems on digraphs
to replace Ti (,) " · G , with a graph filter. Precisely in inspir-
(,)
Inverse problems such as sampling and deconvolution have
ing this approach is where GSP insights and advances have played a central role in the development of GSP. Different mod-
been transformative because basic shift-invariance properties eling assumptions must be considered when addressing these
and convolution operations are otherwise not well defined for problems for digraphs; a good practice is to leverage the con-
graph signals [19]. cepts introduced in the “GSP Preliminaries,
Early contributions following the graph- As in standard NN Frequency Analysis, and Signal Represen-
filtering rationale have emerged from the architectures, GNN tations” and “Graph Filters and Nonlinear
machine learning community. The spectral Graph Signal Operators” sections and bal-
parameters (i.e., the filter
approach in [19] relies on the Laplacian ancing practical utility with mathematical
eigenvectors V and parametrizes the trans-
coefficients for each of tractability. For instance, parsimonious
f o r m a t i o n Ti (,) " · G , = Vdiag (i (,)) V T the layers) are learned
(,)
signal models based on graph smoothness
via gu = i (,), the filter’s frequency response using stochastic gradient or bandlimitedness are widely adopted. Al-
in (3) that is learned using backpropagation. descent. ternatively, observations can be modeled as
Although successful in many applications, the outputs of graph filters driven by white,
when dealing with digraphs these approaches suffer from the sparse, or piecewise constant inputs. This approach is particu-
same limitations as those discussed for their linear counter- larly useful in applications dealing with diffusion processes
parts. Moreover, scalability is often an issue due to the compu- defined over real-world networks with directional links. In this
tational burden associated with calculating the eigenvectors of section, we formally introduce a selection of prominent inverse
large (albeit sparse) graphs. Alternative architectures proposed problems, present established approaches for their solution,
replacing Ti (,) " z^, - 1h G , with (I - i (,) A) z (, - 1), where A is
(,)
and identify the main challenges when the signals at hand are
the (possibly nonsymmetric) adjacency matrix of the graph, and defined over digraphs.
(,)
i is a learnable scalar. To increase the number of parameters,
some authors have considered learning the nonzero entries of Sampling and reconstruction
A, assuming that its support is known. A more natural approach The sampling of graph signals and their subsequent recon-
is to replace Ti (,) " · G , with the polynomial filter in (4) and
(,)
struction have arguably been the most widely studied problems
consider the filter taps h = i (,) as the parameters to be learned within GSP [2, Ch. 9]. Broadly speaking, the objective is to
(see [18] and the references therein). Once again, implementing infer the value of the signal at every node from the observa-
(6)–(8) with H (,) = R lL=, -01 h l S l in lieu of Ti (,) " · G , exhibits a
(,) (,)
tions at a few nodes by leveraging the structure of the graph.
number of advantages because 1) the graph filter is always well To describe the problem formally, let us introduce the fat, bi-
defined (even for nondiagonalizable GSOs), 2) the degree of the nary, M × N sampling matrix C M and define the sampled sig-
filter controls the complexity of the architecture (the number of nal as xr = C M x. Notice that if M represents the subset of M
learnable parameters), and 3) the polynomial definition guaran- < N nodes where the signal is sampled, C M has exactly one
tees that the resultant graph filter can be implemented efficient- nonzero element per row, and the position of those nonzero
ly (via the successive application of sparse matrices), which is elements correspond to the indexes of the nodes in M. Then,
essential in scaling to large data sets. As in standard NN archi- the signal xr is indeed a selection of M out of the N elements
tectures, GNN parameters (i.e., the filter coefficients for each of x. This raises two fundamental questions: How can we re-
of the layers) are learned using stochastic gradient descent. For construct x from xr , and how can we design C M to facilitate
supervised learning tasks, the goal is to minimize a suitable loss this reconstruction?
function over a training set of (labeled) examples. The sparsity Starting with the first question, early works assumed
of S and the efficient implementation of polynomial graph fil- the graph to be undirected and the signal x to be band-
ters [compare (3)] are cardinal properties to keep the overall limited, i.e., to be a linear combination of just a few lead-
computational complexity in check. ing eigenvectors of the GSO. The GSO was typically set
Beyond convolutional GNNs, the aforementioned findings to the Laplacian L, with its eigenvectors V = [v 1, ..., v N ]
are also valid for recurrent GNNs. Furthermore, one can also being real valued and orthogonal. That is, the signal was
(,)
replace the graph filter H (,) = R lL=, -01 h l S l either with a set of assumed to be expressible as x = R kK= 1 xu k v k: = VK xu K , where
(,)
parallel filters, or with its node-variant H nv or edge-variant xu K ! R K collects the K active frequency coefficients, and
(,)
H ev counterparts. All of them preserve the distributed imple- VK is a submatrix of the GFT. Indeed, because the leading

IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 105


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
eigenvectors in V are those with the smallest total varia- decide which type of signal dictionary is going to be used.
tion [cf. (S1)], this model was originally motivated by the This likely depends on the prior domain knowledge as well
practical importance of signals that vary smoothly with as on the properties of the signals at hand. If no prior knowl-
the underlying graph. Under the band-limited assumption, edge exists, schemes considering different dictionaries (at the
the sampled signal xr is given by xr = C M x = C M VK xu K . expense of increasing the sample complexity) may be pru-
Clearly, if the linear transformation represented by matrix dent. Moreover, in the cases where the selected basis is com-
C M VK ! R M # K has full column rank (that is, if the rank of posed of the eigenvectors of the GSO, the recovery problems
C M VK is equal to K), then xu K can be recovered from xr . need to be formulated in the complex domain, and oversam-
Once the coefficients xu K are known, the pling is likely to be required in scenarios
signal in the original domain can be found When dealing with the where noise, outliers, or model mismatches
as x = VK xu K = VK (C M VK ) @ xr . Hence, the sampling of real-world are present.
critical factor used to characterize the Additional models for the observed sig-
signals defined over
recovery of x from xr is the invertibil- nal have been studied in the digraph lit-
ity (and conditioning) of matrix C M VK ,
digraphs, a first step is erature, including the cases where 1) the K
which is a submatrix of V formed by the K to decide which type of dictionary atoms spanning x are not known
columns corresponding to the active fre- signal dictionary is going a priori (thus leading to a sparse regression
quencies and the M rows corresponding to be used. problem) [21], 2) the observations do not
to the sampled nodes in M. Note that a correspond to values of x but rather of S i x
key difference with sampling in classical SP is that, design- for varying i (which can be interpreted as sampling an evolv-
ing matrix C M as a regular sampler is meaningless in GSP ing network process as opposed to a static one) [24], 3) total-
because the node indexing is completely arbitrary. variation metrics are considered in the form of regularizers or
Indeed, multiple approaches have been proposed to iden- constraints [25], and 4) the signal x is modeled as the output
tify the most informative nodes on a graph for subsequent of a graph filter excited by a structured input [5], [26]. We
reconstruction. This is tantamount to leveraging the (spec- revisit the two last cases while studying the next collection of
tral) properties of C M VK to design sampling matrices C M inverse problems.
that lead to an optimal reconstruction. For example, by maxi-
mizing the minimum singular value of C M VK , the sampling (Blind) deconvolution, system identification,
set is designed to minimize the effect of noise in a mean- and source localization
squared-error sense [21]. The fact that the reconstruction We now introduce a family of recovery and reconstruction
matrix is a submatrix of the eigenvectors of the graph has problems involving signals over digraphs. The common de-
also been exploited to design optimal low-pass graph-filter- nominator across all of them is the assumption that the gen-
ing operators that can reconstruct the original signal x by erative model y = Hx holds, where y is a (partially) observed
implementing local exchanges [22] as well as efficient algo- graph signal, H is a linear graph filter, and x is a potentially
rithms that leverage the sparsity of the graph to compute VK unknown and structured input. Building on this model and
efficiently [23]. assuming that we have access to samples of the output y, the
When dealing with the sampling and reconstruction of supporting digraph, and side information on H and x, the
signals defined on digraphs, a number of challenges arise. goal is to recover 1) the graph filter H (system identification),
As introduced in the “Digraph Fourier Transforms: Spec- 2) the values of x (deconvolution), 3) the support of x (source
tral Methods” section, multiple definitions of GFT coexist localization), and 4) both the graph filter and the values of x
for digraphs. Some of those are based on generalizations of (blind deconvolution).
smoothness and lead to real-valued orthogonal dictionaries. Because graph filters can be efficiently used to model
In those cases, the results presented for signals in undirected local diffusion dynamics, the relevance of the aforementioned
graphs still hold, but the connections with polynomial low- schemes goes beyond signal reconstruction and permeates to
pass filtering and the ability to find the eigenvectors effi- broader domains, such as opinion formation and source iden-
ciently are lost. Alternatively, one can use (a subset of) the tification in social networks, inverse problems of biological
eigenvectors of the nonsymmetric GSO as the basis for the signals supported on graphs, and the modeling and estimation
signal x. The caveats being, in this case, that the GSO needs of diffusion processes in multiagent networks, all of which are
to be diagonalizable and that the resulting eigenvectors V are typically directed. In particular, we envision applications in
neither orthogonal nor real valued. The latter point implies marketing where, e.g., social media advertisers want to iden-
that the frequency coefficients xu K are complex valued as well tify a small set of influencers so that an online campaign can
so that the recovery methods for digraphs must be conceived go viral; in a health-care policy implementing network analyt-
in the complex field. Regarding the loss of orthogonality, this ics to infer hidden needle-sharing networks of injecting drug
will typically deteriorate the conditioning of the submatrix users; or, in environmental monitoring using wireless sensor
C M VK , which is critical in regimes where noise is present networks to localize heat or seismic sources. As an encom-
and M is close to K. Hence, when dealing with the sampling passing formal framework, consider the following optimiza-
of real-world signals defined over digraphs, a first step is to tion problem:

106 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
{x * , h * , y * } = the input and the emphasis is on recovering the support of x,
L-1 (10) and variations thereof (with imperfect knowledge of h)
argmin L 0 e y, / h l S l x o + a x rx (x) + a h rh (h) + a y ry (y),  are referred to as source localization problems. In terms of
{ x , h, y } l=0
the samples of y that are observed, the optimal selection (in
s.t. x ! X, h ! H, y ! Y, (9)
cases where this selection can be designed) is nontrivial and
considerations similar to those discussed in the “Sampling
where L 0 is a loss function between the observed signal y and and Reconstruction” section apply here as well. Finally, note
its prediction generated by the chosen x and h. The regulariz- that the generative model y = Hx can also be used for undi-
ers rx, rh, and r y promote desirable features on the optimization rected graphs and, as a result, the formulation in (10) and the
variables, and X, H, and Y represent prespecified feasibility associated algorithms can be used in such a case, the main
sets. Although for the undirected case the generative filter H difference being that the theoretical analysis of identifiabil-
can be either defined in the spectral or in the vertex domain, ity and recovery is simpler when the GSO (and hence the
in (9) the polynomial form has been selected. As pointed out filter) is symmetrical.
in the “Graph Filters and Nonlinear Graph Signal Operators” Moving on to the system identification problem, where the
section, the reasons for this choice are multiple: polynomial main objective is to find the filter coefficients h, it is crucial
filters are always well defined (even for nondiagonalizable to note that y is a bilinear function in h and x. Hence, if we
GSOs); the number of parameters is L (in contrast with N for assume that x is given, then the system identification problem
those spectral formulations that do not consider an explicit is very similar to the deconvolution problem where the roles of
parametrization), which is beneficial in the context of inverse x and h are interchanged. In terms of structural priors for an
problems; and the filter can be used to capture distributed dif- unknown h, sparsity can also be employed. More specifically,
fusion dynamics on directed networks, strengthening the prac- it is instrumental to consider a weighted , 1 -norm regulariza-
tical value of the formulation in (9). Finally, even though (9) tion rh (h) = diag (~) h 1, where ~ ! R +L is a weighting vec-
was posited for the generic case where both x and h are un- tor whose weights increase with l = 1, ..., L, the entry index.
known and y might be only partially observed, it can readily In this way, the coefficients associated with higher powers of
incorporate perfect knowledge of any of these variables just S in the filter specification are more heavily penalized, thus
by fixing its value and dropping the corresponding feasibility promoting a low-complexity and numerically stable model for
constraint and regularization term. explaining the observed data. For undirected graphs, the cost
Focusing first on the problem of deconvolution, notice L 0 that enforces the generative graph filter model to hold is
that the nonsymmetric filter H is completely known because often formulated in the spectral domain, bypassing the need of
both the GSO S and the filter coefficients h are assumed to be computing the powers of S. Although we advocate working on
given. The goal then is to use incomplete observations of y to the nodal domain, when the GSO is diagonalizable, formulat-
recover the values of y in the nonobserved nodes and to obtain ing the problem in the spectral domain is also feasible for the
the seeding values in x. Leveraging the notation introduced directed case. The matrices mapping the unknown h to the
in the “Sampling and Reconstruction” section, we denote by observations yr would be complex valued, but the optimiza-
yr = C M y = H M x the sampled output, with H M = C M H tion would still be carried over the real-valued vector h. From
being the corresponding M rows of H. As y and x are graph an algorithmic perspective, the main challenge would be to
signals of the same size, the deconvolution problem is ill posed find the eigenvectors of the nonsymmetric S, while from an
when M < N. Hence, to overcome this, we may assume some analytical point of view, the issue would be the characteriza-
structural prior on the input x. A common assumption is that x tion of the conditioning of the (complex-valued) matrix that
is sparse. This corresponds to setups where the observed sig- maps h to yr .
nal y can be accurately modeled by a few sources percolating The more challenging problem of blind deconvolution
across the entire network. Applications fitting this setup range arises when both the input x and the filter coefficients h are
from social networks where a rumor originated by a small unknown. To formally tackle this problem, we explicitly write
group of people is spread across the network via local opin- the fact that y is a bilinear function of h and x as y = A (xh T ),
ion exchanges, to brain networks where an epileptic seizure where the linear operator A is a function of the nonsymmetric
emanating from few regions is later diffused across the entire S and acts on the outer product of the sought vectors. A direct
brain [22]. Formally, (9) reduces to implementation of the general framework (9) can be employed
for the problem of blind identification, where the goodness-of-
2 2
x * = argmin yr - H M x 2 + a x x 1, (10) fit loss y - A (xh T ) 2 is combined with structure-promoting
x
regularizers for both x and h. Notice, however, that this leads
which is a classical sparse-regression problem with well- to a nonconvex optimization problem for which alternating
established results showing that the recovery performance minimization schemes (e.g., a block-coordinate descent meth-
provably depends on the coherence of the nonsymmetric od that alternates between x and h) can be implemented. To
matrix H M . The , 1 -norm regularizer in (10) acts as a con- derive a convex relaxation, notice that y is a linear function of
vex surrogate of the sparsity-measuring , 0 pseudonorm. the entries of the rank-one matrix Z = xh T . This motivates the
Whenever sparsity is assumed as a structural property of statement of the following convex optimization problem:

IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 107


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
Problem Given Find Fig

System Identification x y h (a)

Deconvolution h y x (a)

Source Localization h y supp(x) (a)

Blind Deconvolution y x h (a)

MIMO Blind Deconvolution "yi,Pi = 1 "xi,Pi = 1 h (b) Left

SIMO Blind Deconvolution "yi,Pi = 1 x "hi,Pi = 1 (b) Middle

Blind Demixing y "xi,Pi = 1 "hi,Pi = 1 (b) Right

x h y CM y

(a)

x1 y1 h1 y1 x1 h1

x2 y2 x h2 y2 x2 h2
h + y

xP yP hP yP xP hP

(b)

FIGURE 2. A summary of the inverse problems introduced. In the schematic representation, graph signals are depicted as red circles and graph operators
as blue rectangles. Notice that filters are functions of the coefficients h and the GSO S [cf. (4)]. However, because S is assumed to be known for every
problem considered, we succinctly represent filters by their coefficients, h. The first four problems refer to the single-input, single-output scenario with
blind deconvolution being the most challenging because only (a sampled version of) the output is observed. Note that the problem frameworks in (b) can
be further extended to the case where the output is partially observed, as in (a). We omit this illustration to minimize redundancy and because these more
challenging problems are generally ill posed even in the case where y is fully observed.

Z * = argmin y - A (Z)
2
+ a1 Z + a2 Z filters (generating multiple outputs) was recently studied in [27],
2 * 2, 1 . (11)
Z thus providing a generalization of the classical blind multi-
channel identification problem in digital SP. Moreover, [26]
The nuclear-norm regularizer $ ) in (11) promotes a low- addresses the blind demixing case where a single observation
rank solution because we know that Z should be the outer formed by the sum of multiple outputs is available, and it is
product of the true variables of interest, that is, x and h. On assumed that these outputs are generated by different sparse
the other hand, the , 2, 1 mixed-norm Z 2, 1 = R iN= 1 z i 2 is the inputs diffused through different graph filters. This variation
sum of the , 2 norms of the rows of Z, thus promoting a row- of the problem is severely ill posed, and strong regularization
sparse structure in Z. This is aligned with a sparse input x forc- conditions should be assumed to ensure recovery, with the
ing rows of Z to be entirely zero from the outer product. After problem being easier if the graph filters are defined over dif-
solving for Z *, one may recover x and h from, e.g., a rank-one ferent GSOs {S i} iI = 1 . Figure 2 provides an overarching view
decomposition of Z * . of the problems mentioned in this section. See also [25] for
Extensions to multiple input, multiple-output (MIMO) pairs additional signal-recovery problems that can be written in the
(with a common filter) along with theoretical guarantees for encompassing framework of (9).
the case where the GSO S is normal (i.e., SS H = S H S ) can be As previously explained, a key feature that allows using
found in [5]. Interestingly, it was empirically observed and the- the problem formulations introduced in this section for sig-
oretically demonstrated that blind deconvolution in circulant nals defined on digraphs is that the generative graph filter
graphs (such as the directed cycle that represents the domain was incorporated in polynomial form. Unfortunately, many
of classic SP) corresponds to the most favorable setting. The of the theoretical guarantees for solving these problems rely
related case of a single graph signal as the input to multiple heavily on the spectral analysis of the GSO, thus ­assuming

108 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
symmetry or at least normality of S. One of the main remain- or, equivalently, 2) the process can be modeled as the out-
ing challenges for inverse problems in digraphs is the deri- put of a graph filter excited with a white input. This allowed
vation of recovery guarantees, along with the identification for establishing parallelisms with the classical definition of
of key performance drivers that can accommodate nondi- weak stationarity for time-varying signals and opened the
agonalizable GSOs and generalized (complex) eigenvec- door to the development of efficient algorithms that estimate
tors. Another important potential research direction is the the second moment of a graph stationary process using fewer
incorporation of alternative generative models by replacing samples. For example, if the eigenvectors of the covariance
graph filters with the more general graph-signal operators and the GSO are the same, instead of estimating the N 2
presented in the “Graph Filters and Nonlinear Graph Signal entries of the covariance matrix, one can focus on estimating
Operators” section, such as node- and edge-variant filters (5) only its N eigenvalues.
or GNNs (6)–(8). Especially in this latter case, system iden- However, from the initial discussion in the “Graph Filters
tification and blind deconvolution would become extremely and Nonlinear Graph Signal Operators” section, it follows that
challenging due to the incorporation of nonlinearities, mak- this convenient equivalence between the frequency and the
ing the convex relaxation in (11) based on the linear operator vertex domains does not hold for digraphs. Indeed, if the GSO
A no longer valid. is not a normal matrix, its eigenvectors cannot coincide with
those of the covariance matrix, which is guaranteed to be nor-
Statistical digraph SP mal. As a result, one must adapt the definitions and sacrifice
Randomness is pervasive in engineering, and graph signals are some of the properties shown for the symmetrical case. To be
no exception. For this reason, here we build upon the results mathematically precise, let us recall that x is a zero-mean ran-
presented in the previous sections to discuss recent advances dom process defined on the digraph G with GSO S, and let us
and challenges to develop statistical models for random graph denote by C x := E [xx T ] the N × N covariance matrix of x. We
signals defined over digraphs. In the field of statistics, graphs say that the random graph signal x is stationary in the nonsym-
quickly emerged as a convenient intuitive mathematical struc- metric S if it can be described as
ture to describe complex statistical dependencies across mul-
L-1
tidimensional variables. A prominent example is that of Mar-
kov random fields (MRFs), which are symmetrical graphical
x = Hw, with H := / h l S l and E [ww T ] = I, (12)
l=0
models whose edges capture conditional dependencies across
the variables represented by the nodes. Inference over MRFs is where L # N, and w is a white zero-mean random signal. By
computationally affordable and, for the particular case of the adopting the generative model in (12), it follows that the covariance
signals being Gaussian, the graph describing the MRF can be of x can be written as C x = E [xx T ] = HE [ww T ] H T = HH T ,
inferred directly from the precision (inverse covariance) ma- which is not a polynomial on S, but on both S and S T . As a
trix of the data. In parallel, digraphs have been used to capture result, it is no longer true that C x is diagonalized by the GFT
one-directional conditional dependence (hence causal) rela- associated with S. Nonetheless, the model in (12) is still ex-
tions, with Bayesian networks—which, in addition to being di- tremely useful because it 1) provides an intuitive explanation
rected, are acyclic—being the most tractable graphical model of the notion of graph stationarity; 2) can be used to estab-
within this class. lish connections with higher-order, autoregressive directed
GSP literature has also contributed to the statistical mod- structural equation models (SEMs) in statistics: and 3) gives
eling of random graph signals. The first step is to postulate rise to efficient estimators that, rather than targeting the
how the graph structure plays a role in shaping the signal’s sta- estimation of the full covariance, try to estimate the filter
tistical properties and then analyze how the model put forth coefficients h. Indeed, this latter point is also relevant for
can be used to tackle inference tasks more effectively. As in undirected graphs.
the case of graphical models, most existing results focused on Although approaches that focus on the spectral defini-
undirected graphs. tion of stationary processes require estimating the N eigen-
Arguably, the most relevant line of work has been the gen- values of C x (i.e., the power spectral density of the process
eralization of the definition of weak stationarity to signals x), the generative approaches based on (12) open the door to
supported on either undirected graphs or on graphs whose imposing additional structure on the generative filter. For
GSO is a normal matrix [2, Ch. 12]. Although this latter example, one can consider a finite-impulse/infinite-impulse
characterization includes some digraphs (such as circulant response filter with a number of coefficients L much smaller
and skew-Hermitian), the definitions cannot be applied to a than the number of nodes N, which results in considerable
generic nonsymmetric GSO. The key contribution of the arti- gains in terms of either the sampling complexity or the esti-
cles reviewed in [2, Ch. 12] was to provide a dual definition mation error.
for stationary graph processes, which was consistent with the The generative model in (12) can be generalized or con-
vertex and frequency interpretations of graph signals. Spe- strained to fit a range of suitable scenarios. Focusing first on
cifically, it was stated that a zero-mean random graph signal x the input signal, cases of practical interest include 1) consid-
was weakly stationary on a known graph G if 1) its covari- ering nonwhite input processes w with known covariance, 2)
ance matrix has the same eigenvectors as those of the GSO; requiring w to not only be white but also independent, and 3)

IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 109


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
particularizing the distribution of w to tractable and practically rected graphs, which naturally give rise to more tractable (and
meaningful cases. Two examples that fall into the final cat- often uniquely identifiable) formulations [12]. Therefore, next,
egory are modeling w as either a Gaussian or a (signed) Ber- we outline a few noteworthy digraph topology-identification
noulli vector, which is particularly relevant in the context of approaches that are relevant to (or are informed by) the GSP
the diffusion of sparse signals. Alternatively, the model in (12) theme of this article. In accordance with our narrative’s leit-
can be enlarged by considering other linear graph signal opera- motif, we emphasize the key differences with the undirected
tors as generators, including the node- and edge-variant graph case and review the main challenges associated with the new
filters discussed in (5). Recent works have formulations.
also proposed nonlinear generative mod- Discovering directional We initiate our exposition with struc-
els that exploit results in the deep learning tural equation modeling, which broadly
literature to generate random signals over
influence among variables encapsulates a family of statistical methods
directed and undirected graphs. For exam- is at the heart of causal that describe causal relationships between
ple, one can take the architecture in (6)–(8), inference. interacting variables in a complex system.
replace Ti (,) " · G , with H (,) = R lL=, -01 h l S l,
(,) (,)
This is pursued through the estimation of
use a random realization of the white signal w as input, and linear relationships among endogenous as well as exogenous
then view the output of the GNN architecture as the ran- traits. SEMs have been extensively adopted in economics, psy-
dom process to be modeled. Although characterizing how chometrics, social sciences, and genetics, among other disci-
the coefficients {h (,)} ,L=N 1 affect the statistical properties of plines (see, e.g., [28]). SEMs postulate a linear time-invariant
the output is certainly relevant, equally interesting problems network model of the following form, where the GSO is speci-
arise when the goal is to use a set of realizations of the output fied as the adjacency matrix S = A:
x to learn the parameters of the nonlinear generative model
(i.e., the filter coefficients {h (,)} ,L=N 1 ) that best fit the avail- N
x it = / S ij x jt + ~ ii u it + e it, i ! N & x t = Sx t + Xu t + e t,
able observations. j = 1, j ! i
The statistical models briefly reviewed in this section (13)
accounted for nonsymmetric interactions among variables and
can be leveraged, for example, to enhance covariance estima- where x t = [x 1t, f, x Nt] T represents a graph signal of endog-
tion schemes, to denoise a set of observed graph signals, or to enous variables at discrete-time t and u t = [u 1t, f, u Nt] T is a
interpolate (predict) all the values of a graph signal using as vector of exogenous influences. The term Sx t in (13) models
input observations collected at a subset of nodes. Perhaps less network effects, implying that xit is a linear combination of
obvious but arguably equally important, the postulated mod- the instantaneous values xjt of node i’s in-neighbors j ! N i .
els can also be used to infer the graph itself. Indeed, if one The signal xit also depends on uit, where weight ~ ii captures
R
has access to a set X : = {x r} r = 1 of R realizations of x and the the level of influence of external sources, and we defined
graph is sufficiently sparse (so that the number of edges E is Ω := diag (~ 11, f, ~ NN ) . Vector e t represents measurement
2
much smaller than N ), one could identify the L + E degrees errors and unmodeled dynamics. Depending on the context,
of freedom in (12) from the RN values in X, provided that R is x t can be thought of as an output signal, while u t corresponds
sufficiently large. This is partially the subject of the next sec- to the excitation or control input. In the absence of noise and
tion, which deals with the problem of inferring the topology of letting Ω = I for simplicity, (13) becomes x t = Hu t, where
digraphs from a set of nodal observations. H := (I - S) -1 is a polynomial graph filter, as in (4).
Given snapshot observations X := {x t, u t} Tt = 1, SEM pa­­
Digraph topology inference rameters S and ~ := [~ 11, f, ~ NN ] T are typically estimated
Capitalizing on the GSP advances surveyed thus far requires via penalized least squares, for instance, by solving
a specification of the underlying digraph. However, G is often
T 2
unobservable and, accordingly, network topology inference St = argmin / x t - Sx t + Ωu t + a S 1,
from a set of (graph signal) measurements is a prominent yet S , ~ t=1 2

challenging problem, even more so when the graph at hand is s.t. Ω = diag (~), S ii = 0, i = 1, f, N, (14)
directed. Early foundational contributions can be traced back
several decades to the statistical literature of graphical model where the , 1 -norm penalty promotes sparsity in the adjacen-
selection (see, e.g., [4, Ch. 7] and the opening of the “Statis- cy matrix. Both edge sparsity and endogenous inputs play a
tical Digraph SP” section). Discovering directional influence critical role in guaranteeing that the SEM parameters (13) are
among variables is at the heart of causal inference, and iden- uniquely identifiable (see also [28]). Acknowledging the limi-
tifying the cause-and-effect digraphs (so-termed structural tations of linear models, [29] leverages kernels within the SEM
causal models) from observational data are a notoriously dif- framework to model nonlinear pairwise dependencies among
ficult problem [6, Ch. 7 and 10]. network nodes (see the “Applications” section for results on the
Recently, the fresh modeling and signal representation per- identification of gene-regulatory networks).
spectives offered by GSP have sparked renewed interest in Although SEMs capture only contemporaneous relation-
the field. Initial efforts have focused mostly on learning undi- ships among the nodal variables (i.e., SEMs are memoryless),

110 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
sparse vector autoregressive models (SVARMs) account for all l, ll ; 2) recovers a sparse S using the estimates {H t l} and
linear time-lagged (causal) influences instead (see, e.g., [30]). leverages the shift-invariant property of graph filters; and 3)
Specifically, for a given model order L and unknown sparse- estimates hr given {H t l, St } via the lasso. For full algorithmic
evolution matrices {S (l)} lL= 1, SVARMs postulate a multivari- details and their accompanying convergence analysis, see [31].
ate linear dynamical model of the form x t = R lL= 1 S (l) x t - l + e t . In [32], observations from M network processes are mod-
Here, a directed edge from vertex j to i is typically said to be eled as the outputs of a polynomial graph filter [i.e, x m = Hw m,
( l)
present in G if S ij ! 0 for all l = 1, f, L. The aforementioned as in (4)] excited by (unobservable) zero-mean independent
AND rule is often explicitly imposed as a constraint during graph signals w m with arbitrarily correlated nodal compo-
the estimation of SVARM parameters through the require- nents. Observations of the output signals along with prior sta-
ment that all matrices S (l) have a common support. This can tistical information on the inputs are first utilized to identify
be achieved, for instance, via a group lasso penalty, which pro- the nonsymmetric diffusion filter H. Such a problem entails
( 1) (L )
motes sparsity over edgewise coefficients s ij := [S ij , f, S ij ] T solving a system of quadratic matrix equations, which can be
jointly [30]. The sparsity assumption is often well justified due recast as a smooth quadratic minimization subject to Stiefel
to physical considerations or for the sake of interpretability, but manifold constraints (see [32] for details). Given an estimate
here (as well as with SEMs), it is also critical to reliably esti- Ht , the approach used in [32] to infer the digraph topology is to
mate G from limited and noisy time-series data X := {x t} Tt = 1 . find a generic GSO S that satisfies certain desirable topologi-
SVARMs are also central to popular digraph topology- cal properties and commutes with H. For instance, by focusing
identification approaches based on the principle of Granger on the recovery of sparse graphs, one solves
causality (see, e.g., [6, Ch. 10]). Said principle is based on
the concept of precedence and predictability, where node j’s St = argmin S 1, s.t. S ! S, t - SH
HS t
F # e, (17)
S
time series is said to “Granger-cause” the time series at node
i if the knowledge of {x j, t - l} lL= 1 improves the prediction of where S is a convex set specifying the type of GSO sought
xit compared to using only {x i, t - l} lL= 1 . Such a form of causal (say, the adjacency matrix of a digraph), and the constraint
dependence defines the status of a candidate edge from j to t S - SH
H t # e encourages the filter H to be a polynomial
F
i, and it can be assessed via judicious hypothesis testing [28]. S
in while accounting for estimation errors (see [2], [12], and
Recently, a notion different from Granger’s was advocated to [31]). Imposing this last constraint offers an important depar-
associate a graph with causal network effects among vertex ture from the related (undirected) graph-learning algorithms
time series, effectively blending VARMs with graph filter- in [12], [33], and [2, Ch. 13], which identify the structure of
based dynamical models. The so-termed causal graph pro- network-diffusion processes from observations of stationary
cess (CGP) introduced in [31] also considers S = A and has signals [cf. (12) but with symmetrical H]. These approaches
the form first estimate the eigenvectors of H and then constrain S to
be diagonalized by those eigenvectors in a convex problem
L l
xt = / / h li S i x t - l + e t = (h 10 I + h 11 S) x t - 1 to recover the unknown eigenvalues. Although this naturally
l=1 i=0  entails a search over the lower-dimensional space of GSO
+ g + (h L0 I + g + h LL S L) x t - L + e t, (15) eigenvalues, the formulation (17) avoids computing an ei-
gendecomposition and, more importantly, solving a problem
where S is the (possibly asymmetric) adjacency matrix en- over complex-valued variables. This was not an issue in [33]
coding the unknown graph topology. The CGP model cor- because the focus therein was on undirected graphs with real-
responds to a generalized VARM with coefficients given by valued spectrums. In closing, note that the graph-filtering
H l (S, hr ) := R ii = 0 h li S i, where hr : = [h 10, h 11, f, h li, fh LL] T. model advocated in [32] is a special case of (15) provided that
This way, the model can possibly account for multihop nodal S = A and, instead of multivariate time-series data, one relies
influences per time step. Unlike SVARMs, matrices H l (S, hr ) on independent replicates from multiple network processes
need not be sparse for larger values of l, even if S is itself (obtained, e.g., via interventions, as in causal inference [6]).
sparse. Given data X := {x t} Tt = 1 and a prescribed value of L,
to estimate S, one solves the nonconvex optimization problem Applications
T L
2 We highlight four real-world applications of the methods sur-
St = argmin / xt - / H l (S, hr ) x t - l +a S 1 + b hr 1 . veyed in this article. The experiments were chosen to demon-
S, hr t = L+1 l=1
strate the practical value of SP schemes applied to digraphs,
(16)
with the diversity of the data sets considered (climate records,
Similar to the sparse SEMs in (14) and SVARMs, the estima- text excerpts, handwritten characters, and gene-expression lev-
tor encourages sparse graph topologies. Moreover, the , 1 -norm els) underscoring the versatility of the tools.
regularization on the filter coefficients hr effectively imple-
ments a form of model-order selection. A divide-and-conquer Frequency analysis for temperature-signal denoising
heuristic is advocated in [31] to tackle the challenging problem We consider a digraph G of the N = 48 contiguous United
(16), whereby one 1) identifies the filters H l := H l (S, hr ) so that States (Alaska and Hawaii are excluded). A directed edge joins
x t . R lL= 1 R li = 0 H l x t - l, exploiting that H l and H ll commute for two states if they share a border, and the edge direction is set

IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 111


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
so that the state whose barycenter is more to the south points graph signal y superimposed with the denoised temperature
to the one more to the north. As the graph signal x ! R 48, profile xt , and it can be seen that, indeed, xt closely approxi-
we consider the average annual temperature of each state [see mates x. The recovery error increases when the edge directions
Figure 3(a)]. The temperature map confirms that the latitude are ignored (i.e., G is treated as undirected) and when they are
affects the average temperatures of the states, justifying the selected randomly (i.e., every edge is directed but the specific
proposed latitude-based graph-construction scheme. orientation is chosen uniformly at random between the two
We determine a GFT basis U for this digraph via spec- possibilities), as opposed to following the south-to-north orien-
tral-dispersion minimization as in (2) and test its utility in a tation that captures the temperate flow (see [14] for additional
denoising task. More specifically, our goal is to recover the details and experiments).
temperature signal from noisy measurements y = x + w y,
where the additive noise w y is a zero-mean, Gaussian ran- GNNs for authorship attribution
dom vector with covariance matrix 10I N . To achieve this, We illustrate the performance of GNNs for classification in
we implement a low-pass graph filter that retains the first K digraphs through an authorship attribution problem based on
components of the signal’s DGFT and eliminates the rest, i.e., real data. The goal is, using a short text excerpt as input, to
hu = [hu 1, f, hu N ] T , where hu k = I " k # K ,, and K is a prescribed decide whether the text was written by a particular author. To
spectral window size. Hence, we estimate the true temperature capture the style of an author, we consider author-specific word
signal as xt = Udiag (hu ) yu = Udiag (hu ) U T y. adjacency networks (WANs), which are digraphs whose nodes
The original signal x is bandlimited compared to the noisy are function words (i.e., prepositions, pronouns, conjunctions,
signal y, which spans a broader range of frequencies [see Fig- and other words with syntactic importance but little semantic
ure 3(b)]. To better observe the low-pass property of x, we also meaning [34]) and whose edges represent probabilities of di-
plot the cumulative energy of both x and y, defined by the per- rected coappearance of two function words within texts writ-
centage of the total energy present in the first k frequency com- ten by the author [see Figure 4(a)].
ponents for k = 1, f, N. Setting the spectral window at K = 3, We select N = 211 functions words as nodes and build the
the average recovery error e f = xt - x x determined WAN for Emily Brontë. More specifically, we count the num-
by 1,000 Monte Carlo simulations of independent noise was ber of times each pair of function words coappear in 10-word
approximately 12%. Figure 3(c) shows a realization of the noisy windows while also recording their relative order. We then

70

40

(a)

400 100 True Signal


350 1 Noisy Signal
Cumulative

80 Recovered Signal
Energy

300 0.98
Signal Value

250 60
DGFT

0.96
200
150 0.94 40
0 10 20 30 40 50
100
50 ; ~x; ; ~y; 20

0 0
0 10 20 30 40 50 5 10 15 20 25 30 35 40 45
Index Node
(b) (c)

FIGURE 3. Temperature denoising using the DGFT [14]. (a) The graph signal of average annual temperature in Fahrenheit for the contiguous United States.
In the depicted digraph, a directed edge joins two states if they share a border, and the edge directions go from south to north. (b) The DGFT of the origi-
nal signal ^ xu h and the noisy signal ^ yu h along with their cumulative energy distribution across frequencies. (c) A sample realization of the true, noisy, and
recovered temperature signals for a filter bandwidth K = 3.

112 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
normalize the counts out of each node to sum up to one, thus scores the importance of leveraging the digraph structure in
obtaining a weighted digraph whose weights are between 0 and the architecture of GNNs, not only in the linear operators via
1. As for the graph signals, they are defined as each function the incorporation of graph filters but also in the determination
word’s count among 1,000 words. Splitting Emily Brontë’s of nonlinearities.
texts between training and test sets on an 80:20 ratio, her WAN
is generated from function word coappearance counts in the Graph sampling for handwritten digit recognition
training set only. The graph signals in the training set corre- Our goal here is to employ the sampling theory introduced in
spond to 1,000-word excerpts by Brontë and by a pool of 21 the “Sampling and Reconstruction” section to classify hand-
other contemporary authors. Each set of graph signals has an written digits with minimal labels, as developed in [21]. More
associated binary label where 1 indicates that the text has been precisely, we consider a digraph whose N = 10, 000 nodes
written by Brontë and excerpts by the rest of the authors are correspond to gray-scaled images in the Modified National In-
labeled as 0. Test samples are defined analogously. The train- stitute of Standards and Technology (MNIST) data set equally
ing and test sets consisted of 1,092 and 272 excerpts, respec- distributed among the 10 classes (0–9-digit characters). The
tively, both with equally balanced classes, and cross entropy edges are obtained from a 12-nearest-neighbor construction
was chosen as the loss function. computed from the Euclidean distance between vector repre-
Several specific GNN architectures were compared in this sentations of the images. The graph is directed by construction
experiment, all of which followed the general structure in (6)– because one node being in the 12-nearest neighborhood of an-
(,)
(8) but for different choices of the nonlinearity v G . Indeed, the other node does not guarantee that the relation in the opposite
popular pointwise rectified linear unit (ReLU) was contrasted direction holds. This directionality can be especially relevant
with more sophisticated graph-localized (but not necessarily in the treatment of outliers in the embedded space, where every
pointwise) median and maximum activation functions (see outlier still has an incoming neighborhood of size 12 but does
[34] for details). Figure 4(b) presents the authorship attribution not belong to the incoming neighborhood of other nodes, thus
accuracy results after conducting 10 rounds of simulations by having a minimal effect in the label propagation. The edges
varying the training and test splits. We can see that the median are then weighted using a normalized Gaussian kernel so that,
and maximum GNNs performed consistently better than the within each neighborhood of size 12, the closer connections
ReLU GNNs on discerning between texts written by Brontë have a larger weight. Intuitively, images representing the same
and other authors in the pool. Localized activation functions digit tend to have similar pixel values and, hence, are more
outperformed the pointwise ReLU, with smaller average test likely to belong to the neighborhood of each other. Thus, if
errors as well as smaller deviations around this average. Equal- we consider the value of the signal at a given node to be the
ly important, the simulations also show that their associated digit represented by the image associated with that node, the
error was 1–2% lower than that achieved by NN architectures whole graph signal will be piecewise constant in the graph
that symmetrized the WAN. This superior performance under- and thus amenable to being reconstructed from observations
from

can
for
if

0.2
in

by
it

t
lik

bu
e
m

mo
at
ay

r e as
mu d
st an 0.15
no an
Test Error

of all
on a
yet 0.1
one
or wou
ld
r wit
ou h
all wi 0.05
ll
sh
w en

.
so

ax

ed

ax

ed
hi

eL
wh

m
m

m
t

ch
tha

h
h

h
the

wh
then

upo
they

2
1

2
this
to

at

Activation Function
n

(a) (b)

FIGURE 4. Identifying the author of a text using GNNs [34]. (a) An example of a WAN with 40 function words as nodes built from the play “The
­ umorous Lieutenant” by John Fletcher. The radius of the nodes is proportional to the word count, and the darker the edge color the higher the
H
edge weight. Directionality has been ignored for ease of representation, but the GNNs are defined on the directed WAN. (b) The authorship
attribution test error in GNN architectures with localized activation functions for the classification of Emily Brontë versus her contemporaries.
Max.: maximum; med.: median.

IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 113


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
at a few nodes. Furthermore, to account for the fact that the As expected, the accuracy increases with the number of
signal values are categorical, instead of considering a graph samples. Furthermore, note that even when observing only
signal of dimension x ! R N , we consider the alternative bi- 50 samples (0.5 and 0.45% of the MNIST and USPS data
nary matrix representation X ! R N # 10, where X ij = + 1 if sets, respectively), the reconstruction accuracy is nearly
the ith image is a picture of the digit j and X ij = - 1 other- 0.9, highlighting the importance of incorporating the
wise. Each column of X is modeled as a band-limited signal graph structure via optimal samplers that can accommo-
that can be written as the linear combination of the K leading date digraphs. This method was shown to outperform other
columns of V, that is, the eigenvectors of the nonsymmetric graph-based, active semisupervised learning techniques
adjacency matrix A. (see [21] for additional details and experiments).
The graph representation of the MNIST digits is shown
in Figure 5(a), where the edges were removed for clarity, Kernel-based topology inference for gene-expression data
and the coordinates of each node are given by the corre- Consider now the problem of identifying gene-regulatory to-
sponding rows of the first three columns of the iGFT V. pologies, where nodes represent individual genes and directed
The enlarged black nodes indicate the optimal choice for edges encode causal regulatory relationships between gene
10 samples. Optimality, in this case, refers to the design of pairs. Due to the inherent directional nature of regulatory in-
C M to maximize the minimum singular value of C M VK teractions [4, Ch. 7.3], we must recover a digraph as opposed to
(see the “Sampling and Reconstruction” section). Given an undirected relational structure. In this context, we compare
that we have to (pseudo-)invert this matrix for reconstruc- the inferred digraphs recovered when implementing different
tion, a good condition number entails a robust behavior in kernels for SEM inference. The experiments were performed
the presence of noise. It is apparent that the images repre- on gene-regulatory data collected from T = 69 unrelated
senting the same digit form clusters and that the optimal Nigerian individuals under the International HapMap project
samples boil down to choosing representative samples from (see [29] and the references therein for additional details). From
each cluster. The same procedure can be repeated for the the 929 identified genes, expression levels and genotypes of the
United States Postal Service’s (USPS’) handwritten digits expression quantitative trait loci (eQTLs) of N = 39 immune-
data set consisting of N = 11, 000 images to obtain Fig- related genes were selected and normalized. The genotypes of
ure 5(b). For both cases, one can compute the classification eQTLs were considered as exogenous inputs u t, whereas the
accuracy obtained from the reconstructed graph signals gene-expression levels were treated as the endogenous vari-
for a different number of optimal samples [see Figure 5(c)]. ables x t [compare (13)].
Figure 6 depicts the identified topologies, where the dif-
ferent graphs correspond to different choices for the kernel,
and the visualizations only include nodes that have at least a
single incoming or outgoing edge. More precisely, Figure 6(a)
portrays the resulting network based on a linear SEM, while
Figure 6(b) and (c) illustrate the results from nonlinear SEMs
based on a polynomial kernel of second order and a Gauss-
MNIST
ian kernel with unit variance, respectively. In the three cases,
USPS the identified networks are very sparse, and the nonlinear
(a) (b) approaches unveil all of the edges identified by the linear
SEMs, along with a number of additional edges. Clearly, con-
1
sidering the possibility that interactions among genes may be
Classification Accuracy

0.9 driven by nonlinear dynamics, nonlinear frameworks encom-


pass linear approaches and facilitate the discovery of causal
0.8 (directed) patterns not captured by linear SEMs. The newly
0.7 unveiled gene-regulatory interactions could potentially be the
subject of further studies and direct experimental corrobora-
0.6 MNIST tion by geneticists to improve our understanding of causal
USPS influences among immune-related genes across humans.
0.5
0 20 40 60 80 100
Number of Samples Emerging-topic areas and conclusions
(c) Contending that signals defined on digraphs are of paramount
practical importance, this article outlined recent approaches to
FIGURE 5. Semisupervised learning for handwritten digit classification via model, process, and learn from these graph signals. Accord-
the sampling of graph signals [21]. (a) A 3D representation of the MNIST ingly, this article stretched in a comprehensive and unifying
images colored by true class (digits 0–9). The 10 enlarged nodes cor-
respond to the identified optimal samples. (b) Analogous to (a), but for the
manner all the way from the definition of GFTs and graph sig-
USPS data set. (c) Classification accuracy as a function of the number of nal operators especially designed for digraphs to the problem of
samples used for interpolation for both data sets. inferring the digraph itself from the observed signals. A wide

114 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
IGHV3-11
HLA-DRB5 HLA-DQB1

HLA-DQA2 HLA-DRE

IL4I1 HLA-DPA1

HLA-DRB5 HLA-A HLA-DQB1


IL4I1
(a) FCRLA HLA-DRB5
TBRG4 HLA-DQB1

IL24 HLA-DPA1
HLA-DOA2 IGHV3-11

IL4I1 IGHV3-11
FCRLA HLA-DRB4

TAXIBP1 HLA-A TBRG4 HLA-A


HLA-DPA1 HLA-DRB4
(b) (c)

FIGURE 6. Inferring (directed) gene-regulatory networks from expression data [29]. The networks are inferred following the SEM formulation in (14) for (a)
a linear kernel, (b) a polynomial kernel of order two, and (c) a Gaussian kernel with unit variance.

range of signal-recovery problems was selectively covered, years, and part of that success can be extended to our more
­focusing on inverse problems in digraphs, including sampling, challenging domain. Equally interesting is that the use of deep
deconvolution, and system identification. A statistical view- learning to generate the graphs themselves (as opposed to the
point for signal modeling was also discussed by extending the graph signals) has recently gained traction so that, as discussed
definition of weak stationarity of random graph processes to in this article, one can conceive NN architectures that learn
the directed domain. The last stop was to review recent results (and even generate) digraphs from training graph signals while
that applied the tools surveyed in this article to the problem of encoding desirable topological features. One final direction of
learning the topology of a digraph from nodal observations, future research is the extension of the concepts discussed in this
an approach that can lead to meaningful connections between article to the case of higher-order directed relational structures.
GSP and the field of causal inference in statistics. A common The generalization of GSP to hypergraphs through tensor mod-
theme in the extension of established GSP concepts to the less- els and simplicial complexes has been explored in recent years,
explored realm of digraphs is that the definitions and notions but their analysis in directed scenarios is almost uncharted
that rely heavily on spectral properties are challenging to gen- research territory.
eralize, whereas those that can be explicitly postulated in the
vertex domain are more amenable to be extended to digraphs. Acknowledgments
A diverse gamut of potential research avenues naturally fol- The work presented in this article was supported by the Span-
lows from the developments presented. Efficient approaches ish Federal Grants Klinilycs and SPGraph (TEC2016-75361-R
for the computation of the multiple GFTs for digraphs (akin and PID2019-105032GB-I00) and NSF Awards CCF-1750428
to the fast Fourier transform in classical SP) would facilitate and ECCS-1809356. Antonio G. Marques is the correspond-
the adoption of this methodology in large-scale settings. The ing author.
incorporation of nonlinear (median, Volterra, and NNs) graph
signal operators as generative models for the solution of inverse Authors
problems is another broad area of promising research. Deep Antonio G. Marques ([email protected])
generative models for signals defined in regular domains (such received his five-year diploma and doctorate degree in tele-
as images) have shown remarkable success over the last several communications engineering (both with highest honors) from

IEEE SIGNAL PROCESSING MAGAZINE | November 2020 | 115


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.
Carlos III University of Madrid, Spain, in 2002 and 2007, [12] G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro, “Connecting the dots:
Identifying network structure via graph signal processing,” IEEE Signal Process.
respectively. Currently, he is a full professor with the De­­ Mag., vol. 36, no. 3, pp. 16–43, 2019. doi: 10.1109/MSP.2018.2890143.
partment of Signal Theory and Communications, King Juan [13] S. Sardellitti, S. Barbarossa, and P. Di Lorenzo, “On the graph Fourier trans-
Carlos University, Madrid, Spain. He is a member of EURASIP form for directed graphs,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 6, pp.
796–811, 2017. doi: 10.1109/JSTSP.2017.2726979.
and the ELLIS society. He was also the recipient of the 2020
[14] R. Shafipour, A. Khodabakhsh, G. Mateos, and E. Nikolova, “A directed graph
EURASIP Early Career Award. His research interests lie in Fourier transform with spread frequency components,” IEEE Trans. Signal
the areas of signal processing, machine learning, and network Process., vol. 67, no. 4, pp. 946–960, 2019. doi: 10.1109/TSP.2018.2886151.
[15] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-filter design and
science. He is a Member of IEEE. applications to distributed linear network operators,” IEEE Trans. Signal Process.,
Santiago Segarra ([email protected]) received his B.Sc. vol. 65, no. 15, pp. 4117–4131, 2017. doi: 10.1109/TSP.2017.2703660.
degree in industrial engineering with highest honors (valedicto- [16] M. Coutino, E. Isufi, and G. Leus, “Advances in distributed graph filtering,”
IEEE Trans. Signal Process., vol. 67, no. 9, pp. 2320–2333, 2019. doi: 10.1109/
rian) from the Buenos Aires Institute of Technology, Argentina, TSP.2019.2904925.
in 2011, his M.Sc. degree in electrical engineering from the [17] S. Segarra, A. G. Marques, G. R. Arce, and A. Ribeiro, “Center-weighted
University of Pennsylvania (Penn), Philadelphia, in 2014, and median graph filters,” in Proc. IEEE Global Conf. Signal and Information
Pro ces s i ng (G l o b a l S I P), D e c. 2 016 , p p. 336 –3 4 0. doi: 10.110 9/
his Ph.D. degree in electrical and systems engineering from GlobalSIP.2016.7905859.
Penn in 2016. Currently, he is the W.M. Rice Trustee Assistant [18] F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolutional neural net-
Professor in the Department of Electrical and Computer work architectures for signals supported on graphs,” IEEE Trans. Signal Process.,
vol. 67, no. 4, pp. 1034–1049, 2019. doi: 10.1109/TSP.2018.2887403.
Engineering at Rice University, with a courtesy appointment in [19] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst,
computer science. His research interests include network theory, “Geometric deep learning: Going beyond Euclidean data,” IEEE Signal Process.
Mag., vol. 34, no. 4, pp. 18–42, 2017. doi: 10.1109/MSP.2017.2693418.
data analysis, machine learning, and graph signal processing.
[20] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive
Gonzalo Mateos ([email protected]) received survey on graph neural networks,” IEEE Trans. Neural Netw. Learn. Syst., early
his B.Sc. degree in electrical engineering from the University access, Mar. 24, 2020. doi: 10.1109/TNNLS.2020.2978386.
of the Republic, Montevideo, Uruguay, in 2005 and his [21] S. Chen, R. Varma, A. Sandryhaila, and J. Kovacevi , “Discrete signal pro-
cessing on graphs: Sampling theory,” IEEE Trans. Signal Process., vol. 63, no. 24,
M.Sc. and Ph.D. degrees in electrical engineering from the pp. 6510–6523, 2015. doi: 10.1109/TSP.2015.2469645.
University of Minnesota, Minneapolis, in 2009 and 2012, [22] S. Segarra, A. G. Marques, G. Leus, and A. Ribeiro, “Reconstruction of graph
respectively. Currently, he is an associate professor with the signals through percolation from seeding nodes,” IEEE Trans. Signal Process., vol.
64, no. 16, pp. 4363–4378, 2016. doi: 10.1109/TSP.2016.2552510.
Department of Electrical and Computer Engineering, the [23] L. Le Magoarou, R. Gribonval, and N. Tremblay, “Approximate fast graph
University of Rochester, New York, where he is also the Asaro Fourier transforms via multilayer sparse approximations,” IEEE Trans. Signal
Inf. Process. Netw., vol. 4, no. 2, pp. 407–420, 2017. doi: 10.1109/TSIPN.2017.
Biggar Family Fellow in Data Science. His research interests 2710619.
lie in the areas of statistical learning from graph data; network [24] A. G. Marques, S. Segarra, G. Leus, and A. Ribeiro, “Sampling of graph sig-
science; decentralized optimization; and graph signal process- nals with successive local aggregations,” IEEE Trans. Signal Process., vol. 64, no.
7, pp. 1832–1843, 2016. doi: 10.1109/TSP.2015.2507546.
ing with applications to social, communication, power grid,
[25] S. Chen, A. Sandryhaila, J. M. F. Moura, and J. Kovacevi , “Signal recovery
and brain network analytics. on graphs: Variation minimization,” IEEE Trans. Signal Process., vol. 63, no. 17,
pp. 4609–4624, 2015. doi: 10.1109/TSP.2015.2441042.

References [26] F. J. Iglesias, S. Segarra, S. Rey-Escudero, A. G. Marques, and D. Ramirez,


“Demixing and blind deconvolution of graph-diffused sparse signals,” in Proc.
[1] A. Ortega, P. Frossard, J. Kovacevi , J. M. F. Moura, and P. Vandergheynst,
“Graph signal processing: Overview, challenges, and applications,” Proc. IEEE, vol. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018, pp.
106, no. 5, pp. 808–828, 2018. doi: 10.1109/JPROC.2018.2820126. 4189–4193. doi: 10.1109/ICASSP.2018.8462154.

[2] P. Djuric and C. Richard, Cooperative and Graph Signal Processing: [27] Y. Zhu, F. J. Iglesias, A. G. Marques, and S. Segarra, “Estimation of network
Principles and Applications. San Diego, CA: Academic Press, 2018. processes via blind graph multi-filter identification,” in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Processing (ICASSP), May 2019, pp. 5451–5455.
[3] D. Shuman, S. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The doi: 10.1109/ICASSP.2019.8683844.
emerging field of signal processing on graphs: Extending high-dimensional data
analysis to networks and other irregular domains,” IEEE Signal Process. Mag., vol. [28] G. B. Giannakis, Y. Shen, and G. V. Karanikolas, “Topology identification and
30, no. 3, pp. 83–98, 2013. doi: 10.1109/MSP.2012.2235192. learning over graphs: Accounting for nonlinearities and dynamics,” Proc. IEEE, vol.
106, no. 5, pp. 787–807, 2018. doi: 10.1109/JPROC.2018.2804318.
[4] E. D. Kolaczyk, Statistical Analysis of Network Data: Methods and Models.
New York: Springer-Verlag, 2009. [29] Y. Shen, B. Baingana, and G. B. Giannakis, “Kernel-based structural equation
models for topology identification of directed networks,” IEEE Trans. Signal
[5] S. Segarra, G. Mateos, A. G. Marques, and A. Ribeiro, “Blind identification of Process., vol. 65, no. 10, pp. 2503–2516, 2017. doi: 10.1109/TSP.2017.2664039.
graph filters,” IEEE Trans. Signal Process., vol. 65, no. 5, pp. 1146–1159, 2017. doi:
10.1109/TSP.2016.2628343. [30] A. Bolstad, B. D. V. Veen, and R. Nowak, “Causal network inference via group
sparse regularization,” IEEE Trans. Signal Process., vol. 59, no. 6, pp. 2628–2641,
[6] J. Peters, D. Janzing, and B. Schölkopf, Elements of Causal Inference: 2011. doi: 10.1109/TSP.2011.2129515.
Foundations and Learning Algorithms. Cambridge, MA: MIT Press, 2017.
[31] J. Mei and J. M. F. Moura, “Signal processing on graphs: Causal modeling of
[7] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” unstructured data,” IEEE Trans. Signal Process., vol. 65, no. 8, pp. 2077–2092,
IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656, 2013. doi: 10.1109/ 2017. doi: 10.1109/TSP.2016.2634543.
TSP.2013.2238935.
[32] R. Shafipour, S. Segarra, A. G. Marques, and G. Mateos, “Directed network
[8] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: topology inference via graph filter identification,” in Proc. IEEE Data Science
Frequency analysis,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3042–3054, Workshop (DSW), June 2018, pp. 210–214. doi: 10.1109/DSW.2018.8439888.
2014. doi: 10.1109/TSP.2014.2321121.
[33] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro, “Network topology
[9] H. Sevi, G. Rilling, and P. Borgnat, “Harmonic analysis on directed graphs and inference from spectral templates,” IEEE Trans. Signal Inf. Process. Netw., vol. 3,
applications: From Fourier analysis to wavelets,” 2018, arXiv:1811.11636v2. no. 3, pp. 467–483, 2017. doi: 10.1109/TSIPN.2017.2731051.
[10] F. Chung, “Laplacians and the Cheeger inequality for directed graphs,” Ann. [34] L. Ruiz, F. Gama, A. G. Marques, and A. Ribeiro, “Invariance-preserving
Comb., vol. 9, no. 1, pp. 1–19, 2005. doi: 10.1007/s00026-005-0237-z. localized activation functions for graph neural networks,” IEEE Trans. Signal
[11] J. A. Deri and J. M. F. Moura, “Spectral projector-based graph Fourier trans- Process., vol. 68, pp. 127–141, Nov. 2019. doi: 10.1109/TSP.2019.2955832.
forms,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 6, pp. 785–795, 2017. doi: 
10.1109/JSTSP.2017.2731599. SP

116 IEEE SIGNAL PROCESSING MAGAZINE | November 2020 |


Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on October 29,2020 at 21:03:39 UTC from IEEE Xplore. Restrictions apply.

You might also like