0% found this document useful (0 votes)

28 views

2019 - Medium - Tutorial On Graph Neural Networks For Computer Vision and Beyond - by Boris Knyazev

This document summarizes a tutorial on graph neural networks (GNNs) for computer vision and other domains. It discusses why graphs are useful representations, why defining convolution on graphs is difficult, and what makes a neural network a GNN. Specifically, graphs can help solve challenging problems, provide flexibility in data representation, and naturally represent relational data. Defining convolution on graphs is difficult because graphs lack the grid-like structure of images. A neural network is a GNN if it operates on graph-structured data and passes messages between nodes. The tutorial provides examples and code to illustrate GNN concepts.

Uploaded by

S Vasu Krishna

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

2019 - Medium - Tutorial On Graph Neural Networks For Computer Vision and Beyond - by Boris Knyazev

Uploaded by

S Vasu Krishna

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

Open in app Sign up Sign In

Boris Knyazev Follow

Aug 4, 2019 · 17 min read · Listen

Save

Tutorial on Graph Neural Networks for

Computer Vision and Beyond
I’m answering questions that AI/ML/CV people not familiar with graphs or graph neural
networks typically ask. I provide PyTorch examples to clarify the idea behind this relatively
new and exciting kind of model.

1.8K 10

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 1/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

A figure from (Bruna et al., ICLR, 2014) depicting an MNIST image on the 3D sphere. While it’s hard to adapt
Convolutional Networks to classify spherical data, Graph Networks can naturally handle it. This is a toy
example, but similar tasks arise in many real applications.

The questions addressed in this part of my tutorial are:

1. Why are graphs useful?

2. Why is it difficult to define convolution on graphs?

3. What makes a neural network a graph neural network?

To answer them, I’ll provide motivating examples, papers and Python code making
it a tutorial on Graph Neural Networks (GNNs). Some basic knowledge of machine
learning and computer vision is expected, however, I’ll provide some background
and intuitive explanation as we go.

First of all, let’s briefly recall what is a graph? A graph G is a set of nodes (vertices)
connected by directed/undirected edges. Nodes and edges typically come from
some expert knowledge or intuition about the problem. So, it can be atoms in
molecules, users in a social network, cities in a transportation system, players in
team sport, neurons in the brain, interacting objects in a dynamic physical system,
pixels, bounding boxes or segmentation masks in images. In other words, in many
practical cases, it is actually you who gets to decide what are the nodes and edges in
a graph.

In many practical cases, it is actually you who gets to

decide what are the nodes and edges in a graph.
This is a very flexible data structure that generalizes many other data structures. For
example, if there are no edges, then it becomes a set; if there are only “vertical”
edges and any two nodes are connected by exactly one path, then we have a tree.
Such flexibility is both good and bad as I’ll discuss in this tutorial.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 2/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

Two undirected graphs with 5 and 6 nodes. The order of nodes is arbitrary.

1. Why graphs can be useful?

In the context of computer vision (CV) and machine learning (ML), studying graphs
and the models to learn from them can give us at least four benefits:

1. We can become closer to solving important problems that previously were too
challenging, such as: drug discovery for cancer (Veselkov et al., Nature, 2019);
better understanding of the human brain connectome (Diez & Sepulcre, Nature
Communications, 2019); materials discovery for energy and environmental
challenges (Xie et al., Nature Communications, 2019).

2. In most CV/ML applications, data can be actually viewed as graphs even though
you used to represent them as another data structure. Representing your data as
graph(s) gives you a lot of flexibility and can give you a very different and
interesting perspective on your problem. For instance, instead of learning from
image pixels you can learn from “superpixels” as in (Liang et al., ECCV, 2016)
and in our forthcoming BMVC paper. Graphs also let you impose a relational
inductive bias in data — some prior knowledge you have about the problem. For
instance, if you want to reason about a human pose, your relational bias can be
a graph of skeleton joints of a human body (Yan et al., AAAI, 2018); or if you
want to reason about videos, your relational bias can be a graph of moving
bounding boxes (Wang & Gupta, ECCV, 2018). Another example can be
representing facial landmarks as a graph (Antonakos et al., CVPR, 2015) to make
reasoning about facial attributes and identity.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 3/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

3. Your favourite neural network itself can be viewed as a graph, where nodes are
neurons and edges are weights, or where nodes are layers and edges denote
flow of forward/backward pass (in which case we are talking about a
computational graph used in TensorFlow, PyTorch and other DL frameworks).
An application can be optimization of a computational graph, neural
architecture search, analyzing training behavior, etc.

4. Finally, you can solve many problems, where data can be more naturally
represented as graphs, more effectively. This includes, but is not limited to,
molecule and social network classification (Knyazev et al., NeurIPS-W, 2018) and
generation (Simonovsky & Komodakis, ICANN, 2018), 3D Mesh classification
and correspondence (Fey et al., CVPR, 2018) and generation (Wang et al., ECCV,
2018), modeling behavior of dynamic interacting objects (Kipf et al., ICML,
2018), visual scene graph modeling (see the upcoming ICCV Workshop) and
question answering (Narasimhan, NeurIPS, 2018), program synthesis (Allamanis
et al., ICLR, 2018), different reinforcement learning tasks (Bapst et al., ICML,
2019) and many other exciting problems.

As my previous research was related to recognizing and analyzing faces and

emotions, I particularly like this figure below.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 4/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

A figure from (Antonakos et al., CVPR, 2015) showing representation of a face as a graph of landmarks. This
is an interesting approach, but it is not a sufficient facial representation in many cases, since a lot can be told
from the face texture captured well by convolutional networks. In contrast, reasoning over 3D meshes of a
face looks like a more sensible approach compared to 2D landmarks (Ranjan et al., ECCV, 2018).

2. Why is it difficult to define convolution on graphs?

To answer this question, I first give some motivation for using convolution in
general and then describe “convolution on images” using the graph terminology
which should make the transition to “convolution on graphs” more smooth.

2.1. Why is convolution useful?

Let’s understand why we care about convolution so much and why we want to use it
for graphs. Compared to fully-connected neural networks (a.k.a. NNs or MLPs),
convolutional networks (a.k.a. CNNs or ConvNets) have certain advantages
explained below based on the image of a nice old Chevy.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 5/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

“Chevrolet Vega” according to Google Image Search.

First, ConvNets exploit a natural prior in images, more formally described in

(Bronstein et al., 2016), such as:

1. Shift-invariance — if we translate the car on the image above to the

left/right/up/down, we still should be able to detect and recognize it as a car.
This is exploited by sharing filters across all locations, i.e. applying convolution.

2. Locality — nearby pixels are closely related and often represent some semantic
concept, such as a wheel or a window. This is exploited by using relatively large
filters, which can capture image features in a local spatial neighborhood.

3. Compositionality (or hierarchy)— a larger region in the image is often a

semantic parent of smaller regions it contains. For example, a car is a parent of
doors, windows, wheels, driver, etc. And a driver is a parent of head, arms, etc.
This is implicitly exploited by stacking convolutional layers and applying
pooling.

Second, the number of trainable parameters (i.e. filters) in convolutional layers

does not depend on the input dimensionality, so technically we can train exactly the
same model on 28×28 and 512×512 images. In other words, the model is parametric.

Ideally, our goal is to develop a model that is as

flexible as Graph Neural Nets and can digest and

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 6/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

learn from any data, but at the same time we want to

control (regularize) factors of this flexibility by
turning on/off certain priors.

All these nice properties make ConvNets less prone to overfitting (high accuracy on
the training set and low accuracy on the validation/test set), more accurate in
different visual tasks, and easily scalable to large images and datasets. So, when we
want to solve important tasks where input data are graph-structured, it is appealing
to transfer all these properties to graph neural networks (GNNs) to regularize their
flexibility and make them scalable. Ideally, our goal is to develop a model that is as
flexible as GNNs and can digest and learn from any data, but at the same time we
want to control (regularize) factors of this flexibility by turning on/off certain priors.
This can open research in many interesting directions. However, controlling of this
trade-off is challenging.

2.2. Convolution on images in terms of graphs

Let’s consider an undirected graph G with N nodes. Edges E represent undirected
connections between nodes. Nodes and edges typically come from your intuition
about the problem. Our intuition in the case of images is that nodes are pixels or
superpixels (a group of pixels of weird shape) and edges are spatial distances
between them. For example, the MNIST image below on the left is typically
represented as an 28×28 dimensional matrix. We can also represent it as a set of
N=28*28=784 pixels. So, our graph G is going to have N=784 nodes and edges will
have large values (thicker edges in the Figure below) for closely located pixels and
small values (thinner edges) for remote pixels.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 7/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

An image from the MNIST dataset on the left and an example of its graph representation on the right. Darker
and larger nodes on the right correspond to higher pixel intensities. The figure on the right is inspired by
Figure 5 in (Fey et al., CVPR, 2018)

When we train our neural networks or ConvNets on images, we implicitly define

images on a graph — a regular two-dimensional grid as the one on the figure below.
Since this grid is the same for all training and test images and is regular, i.e. all
pixels of the grid are connected to each other in exactly the same way across all
images (i.e. have the same number of neighbors, length of edges, etc.), this regular
grid graph has no information that will help us to tell one image from another.
Below I visualize some 2D and 3D regular grids, where the order of nodes is color-
coded. By the way, I’m using NetworkX in Python to do that, e.g. G =

networkx.grid_graph([4, 4]) .

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 8/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

Examples of regular 2D and 3D grids. Images are defined on 2D grids and videos are on 3D grids.

Given this 4×4 regular grid, let’s briefly look at how 2D convolution works to
understand why it’s difficult to transfer this operator to graphs. A filter on a regular
grid has the same order of nodes, but modern convolutional nets typically have
small filters, such as 3×3 in the example below. This filter has 9 values: W₁,W₂,…,
W₉, which is what we are updating during training using backprop to minimize the
loss and solve the downstream task. In our example below, we just heuristically
initialize this filter to be an edge detector (see other possible filters here):

Example of a 3×3 filter on a regular 2D grid with arbitrary weights w on the left and an edge detector on the
right.

When we perform convolution, we slide this filter in both directions: to the right
and to the bottom, but nothing prevents us from starting in the bottom corner — the
important thing is to slide over all possible locations. At each location, we compute
the dot product between the values on the grid (let’s denote them as X) and the
values of filters, W: X₁W₁+X₂W₂+…+X₉W₉, and store the result in the output image.
In our visualization, we change the color of nodes during sliding to match the colors
of nodes in the grid. In a regular grid, we always can match a node of the filter with
a node of the grid. Unfortunately, this is not true for graphs as I’ll explain later
below.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 9/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

2 steps of 2D convolution on a regular grid. If we don’t apply padding, there will be 4 steps in total, so the
result will be a 2×2 image. To make the resulting image larger, we need to apply padding. See a
comprehensive guide to convolution in deep learning here.

The dot product used above is one of so called “aggregator operators”. Broadly
speaking, the goal of an aggregator operator is to summarize data to a reduced
form. In our example above, the dot product summarizes a 3×3 matrix to a single
value. Another example is pooling in ConvNets. Keep in mind, that such methods as
max or sum pooling are permutation-invariant, i.e. they will pool the same value
from a spatial region even if you randomly shuffle all pixels inside that region. To
make it clear, the dot product is not permutation-invariant simply because in
general: X₁W₁+X₂W₂ ≠X₂W₁+X₁W₂.

Now let’s use our MNIST image and illustrate the meaning of a regular grid, a filter
and convolution. Keeping in mind our graph terminology, this regular 28×28 grid
will be our graph G, so that every cell in this grid is a node, and node features are an
actual image X, i.e. every node will have just a single feature — pixel intensity from
0 (black) to 1 (white).

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 10/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

Regular 28×28 grid (left) and an image on that grid (right).

Next, we define a filter and let it be a famous Gabor filter with some (almost)
arbitrary parameters. Once we have an image and a filter, we can perform
convolution by sliding the filter over that image (of digit 7 in our case) and putting
the result of the dot product to the output matrix after each step.

A 28×28 filter (left) and the result of 2D convolution of this filter with the image of digit 7 (right).

This is all cool, but as I mentioned before, it becomes tricky when you try to
generalize convolution to graphs.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 11/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

Nodes are a set, and any permutation of this set

does not change it. Therefore, the aggregator
operator that people apply should be permutation-
invariant.
As I have already mentioned, the dot product used above to compute convolution at
each step is sensitive to the order. This sensitivity permits us to learn edge detectors
similar to Gabor filters important to capture image features. The problem is that in
graphs there is no well-defined order of nodes unless you learn to order them, or come
up with some heuristic that will result in a consistent (canonical) order from graph
to graph. In short, nodes are a set, and any permutation of this set does not change
it. Therefore, the aggregator operator that people apply should be permutation-
invariant. The most popular choices are averaging (GCN, Kipf & Welling, ICLR, 2017)
and summation (GIN, Xu et al., ICLR, 2019) of all neighbors, i.e. sum or mean
pooling, followed by projection by a trainable vector W. See Hamilton et al., NIPS,
2017 for some other aggregators.

Illustration of “convolution on graphs” of node features X with filter W centered at node 1 (dark blue).

For example, for the graph above on the left, the output of the summation
aggregator for node 1 will be X₁=(X₁+X₂+X₃+X₄)W₁, for node 2: X₂=(X₁+X₂+X₃+X₅)W₁
and so forth for nodes 3, 4 and 5, i.e. we need to apply this aggregator for all nodes.
In result, we will have the graph with the same structure, but node features will now
https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 12/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

contain features of neighbors. We can process the graph on the right using the same
idea.

Colloquially, people call this averaging or summation “convolution”, since we also

“slide” from one node to another and apply an aggregator operator in each step.
However, it’s important to keep in mind that this is a very specific form of
convolution, where filters don’t have a sense of orientation. Below I’ll show how
those filters look like and give an idea how to make them better.

3. What makes a neural network a graph neural network?

You know how a classical neural network works, right? We have some C-
dimensional features X as the input to the net. Using our running MNIST example,
X will be our C=784 dimensional pixel features (i.e. a “flattened” image). These
features get multiplied by C×F dimensional weights W that we update during
training to get the output closer to what we expect. The result can be directly used
to solve the task (e.g. in case of regression) or can be further fed to some
nonlinearity (activation), like ReLU, or other differentiable (or more precisely, sub-
differentiable) functions to form a multi-layer network. In general, the output of
some layer l is:

Fully-connected layer with learnable weights W. “Fully-connected” means that each output value in X⁽ˡ⁺¹⁾
depends on, or “connected to”, all inputs X⁽ˡ⁾. Typically, although not always, we add a bias term to the output.

The signal in MNIST is so strong, that you can get an accuracy of 91% by just using
the formula above and the Cross Entropy loss without any nonlinearities and other
tricks (I used a slightly modified PyTorch example to do that). Such model is called
multinomial (or multiclass, since we have 10 classes of digits) logistic regression.

Now, how do we transform our vanilla neural network to a graph neural network?
As you already know, the core idea behind GNNs is aggregation over “neighbors”.
Here, it is important to understand that in many cases, it is actually you who
specifies “neighbors”.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 13/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

Let’s consider a simple case first, when you are given some graph. For example, this
can be a fragment (subgraph) of a social network with 5 persons and an edge
between a pair of nodes denotes if two people are friends (or at least one of them
think so). An adjacency matrix (usually denoted as A) in the figure below on the
right is a way to represent these edges in a matrix form, convenient for our deep
learning frameworks. Yellow cells in the matrix represent the edge and blue — the
absence of the edge.

Example of a graph and its adjacency matrix. The order of nodes we defined in both cases is random, while
the graph is still the same.

Now, let’s create an adjacency matrix A for our MNIST example based on
coordinates of pixels (complete code is provided in the end of the post):

import numpy as np
from scipy.spatial.distance import cdist

img_size = 28 # MNIST image width and height

col, row = np.meshgrid(np.arange(img_size), np.arange(img_size))
coord = np.stack((col, row), axis=2).reshape(-1, 2) / img_size
dist = cdist(coord, coord) # see figure below on the left
sigma = 0.2 * np.pi # width of a Gaussian
A = np.exp(- dist ** 2 / sigma ** 2) # see figure below in the
middle

This is a typical, but not the only, way to define an adjacency matrix for visual tasks
(Defferrard et al., NIPS, 2016, Bronstein et al., 2016). This adjacency matrix is our
prior, or our inductive bias, we impose on the model based on our intuition that
nearby pixels should be connected and remote pixels shouldn’t or should have very
thin edge (edge of a small value). This is motivated by observations that in natural
images nearby pixels often correspond to the same object or objects that interact
frequently (the locality principle we mentioned in Section 2.1.), so it makes a lot of
sense to connect such pixels.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 14/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

Adjacency matrix (NxN) in the form of distances (left) and closeness (middle) between all pairs of nodes.
(right) A subgraph with 16 neighboring pixels corresponding to the adjacency matrix in the middle. Since it’s
a complete subgraph, it’s also called a “clique”.

So, now instead of having just features X we have some fancy matrix A with values
in the range [0,1]. It’s important to note that once we know that our input is a graph,
we assume that there is no canonical order of nodes that will be consistent across
all other graphs in the dataset. In terms of images, it means that pixels are assumed to
be randomly shuffled. Finding the canonical order of nodes is combinatorially
unsolvable in practice. Even though for MNIST we technically can cheat by knowing
this order (because data are originally from a regular grid), it’s not going to work on
actual graph datasets.

Remember that our matrix of features X has 𝑁 rows and C columns. So, in terms of
graphs, each row corresponds to one node and C is the dimensionality of node
features. But now the problem is that we don’t know the order of nodes, so we don’t
know in which row to put features of a particular node. If we just pretend to ignore
this problem and feed X directly to an MLP as we did before, the effect will be the
same as feeding images with randomly shuffled pixels with independent (yet the
same for each epoch) shuffling for each image! Surprisingly, a neural network can
in principle still fit such random data (Zhang et al., ICLR, 2017), however test
performance will be close to random prediction. One of the solutions is to simply
use the adjacency matrix A, we created before, in the following way:

Graph neural layer with adjacency matrix A, input/output features X and learnable weights W.

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 15/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

We just need to make sure that row i in A corresponds to features of node in row i of
X. Here, I’m using 𝓐 instead of plain A, because often you want to normalize A. If
𝓐=A, the matrix multiplication 𝓐X⁽ˡ⁾ will be equivalent to summing features of
neighbors, which turned out to be useful in many tasks (Xu et al., ICLR, 2019). Most
commonly, you normalize it so that 𝓐X⁽ˡ⁾ averages features of neighbors, i.e. 𝓐=A/
ΣᵢAᵢ. A better way to normalize matrix A can be found in (Kipf & Welling, ICLR,
2017).

Below is the comparison of NN and GNN in terms of PyTorch code:

1 import torch
2 import torch.nn as nn
3
4 C = 2 # Input feature dimensionality
5 F = 8 # Output feature dimensionality
6 W = nn.Linear(in_features=C, out_features=F) # Trainable weights
7
8 # Fully connected layer
9 X = torch.randn(1, C) # Input features
10 Z = W(X) # Output features : torch.Size([1, 8])
11
12 #Graph Neural Network layer
13 N = 6 # Number of nodes in a graph
14 X = torch.randn(N, C) # Input feature
15 A = torch.rand(N, N) # Adjacency matrix (edges of a graph)
16 Z = W(torch.mm(A, X)) # Output features: torch.Size([6, 8])

nn_vs_gnn.py hosted with ❤ by GitHub view raw

And HERE is the full PyTorch code to train two models above: python mnist_fc.py --

model fc to train the NN case; python mnist_fc.py --model graph to train the GNN
case. As an exercise, try to randomly shuffle pixels in code in the --model graph

case (don’t forget to shuffle A in the same way) and make sure that it will not affect
the result. Is it going to be true for the --model fc case?

Here is the full PyTorch code to train two models.

After running the code, you may notice that the classification accuracy is actually
about the same. What’s the problem? Aren’t graph networks supposed to work
better? Well, they are, in many cases. But not in this one, because the 𝓐X⁽ˡ⁾
operator we added is actually nothing else, but a Gaussian filter:

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 16/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

2D visualization of a filter used in a graph neural network and it’s effect on the image.

So, our graph neural network turned out to be equivalent to a convolutional neural
network with a single Gaussian filter, that we never update during training,
followed by the fully-connected layer. This filter basically blurs/smooths the image,
which is not a particularly useful thing to do (see the image above on the right).
However, this is the simplest variant of a graph neural network, which nevertheless
works great on graph-structured data. To make GNNs work better on regular graphs,
like images, we need to apply a bunch of tricks. For example, instead of using a
predefined Gaussian filter, we can learn to predict an edge between any pair of
pixels by using a differentiable function like this:

import torch.nn as nn # using PyTorch

nn.Sequential(nn.Linear(4, 64), # map coordinates to a hidden layer

nn.ReLU(), # nonlinearity
nn.Linear(64, 1), # map hidden representation to edge
nn.Tanh()) # squash edge values to [-1, 1]

To make GNNs work better on regular graphs, like

images, we need to apply a bunch of tricks. For
example, instead of using a predefined Gaussian

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 17/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

filter, we can learn to predict an edge between any

pair of pixels.

This idea is similar to Dynamic Filter Networks (Brabander et al., NIPS, 2016), Edge-
conditioned Graph Networks (ECC, Simonovsky & Komodakis, CVPR, 2017) and
(Knyazev et al., NeurIPS-W, 2018). To try it using my code, you just need to add the -

-pred_edge flag, so the entire command is python mnist_fc.py --model graph --

pred_edge . Below I show the animation of the predefined Gaussian and learned
filters. You may notice that the filter we just learned (in the middle) looks weird.
That’s because the task is quite complicated since we optimize two models at the
same time: the model that predicts edges and the model that predicts a digit class.
To learn better filters (like the one on the right), we need to apply some other tricks
from our BMVC paper, which is beyond the scope of this part of the tutorial.

2D filter of a graph neural network centered in the red point. Averaging (left, accuracy 92.24%), learned
based on coordinates (middle, accuracy 91.05%), learned based on coordinates with some tricks (right,
accuracy 92.39%).

The code to generate these GIFs is quite simple:

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 18/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

1 import imageio # to save GIFs

2 import matplotlib as mpl
3 import matplotlib.pyplot as plt
4 import numpy as np
5 from scipy.spatial.distance import cdist
6 import cv2 # optional (for resizing the filter to look better)
7
8 img_size = 28
9 # Create/load some adjacency matrix A (for example, based on coordinates)
10 col, row = np.meshgrid(np.arange(img_size), np.arange(img_size))
11 coord = np.stack((col, row), axis=2).reshape(-1, 2) / img_size
12 dist = cdist(coord, coord) # distances between all pairs of pixels
13 sigma = 0.2 * np.pi # width of a Gaussian (can be a hyperparameter when training a model)
14
15 A = np.exp(- dist / sigma ** 2) # adjacency matrix of spatial similarity
16 # above, dist should have been squared to make it a Gaussian (forgot to do that)
17
18 scale = 4
19 img_list = []
20 cmap = mpl.cm.get_cmap('viridis')
21 for i in np.arange(0, img_size, 4): # for every row with step 4
22 for j in np.arange(0, img_size, 4): # for every col with step 4
23 k = i*img_size + j
24 img = A[k, :].reshape(img_size, img_size)
25 img = (img - img.min()) / (img.max() - img.min())
26 img = cmap(img)
27 img[i, j] = np.array([1., 0, 0, 0]) # add the red dot
28 img = cv2.resize(img, (img_size*scale, img_size*scale))
29 img_list.append((img * 255).astype(np.uint8))
30 imageio.mimsave('filter.gif', img_list, format='GIF', duration=0.2)

generate_gif.py hosted with ❤ by GitHub view raw

I’m also sharing an IPython notebook showing 2D convolution of an image with a

Gabor filter in terms of graphs (using an adjacency matrix) compared to using
circulant matrices, which is often used in signal processing.

In the next part of the tutorial, I’ll tell you about more advanced graph layers that
can lead to better filters on graphs.

Update:

Throught this blog post and in the code the dist variable should have been squared
to make it a Gaussian. Thanks Alfredo Canziani for spotting that. All figures and

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 19/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

results were generated without squaring it. If you observe very different results
after squaring it, I suggest to tune sigma .

Conclusion
Graph Neural Networks are a very flexible and interesting family of neural networks
that can be applied to really complex data. As always, such flexibility must come at
a certain cost. In case of GNNs it is the difficulty of regularizing the model by
defining such operators as convolution. Research in that direction is advancing
quite fast, so that GNNs will see application in increasingly wider areas of machine
learning and computer vision.

See another nice blog post about GNNs from Neptune.ai.

Acknowledgement: A large portion of this tutorial was prepared during my internship at

SRI International under the supervision of Mohamed Amer (homepage) and my PhD
advisor Graham Taylor (homepage).

Find me on Github, LinkedIn and Twitter. My homepage.

If you want to cite this tutorial in your paper, please use:

@misc{knyazev2019tutorial,
title={Tutorial on Graph Neural Networks for Computer Vision and Beyond},
author={Knyazev, Boris and Taylor, Graham W and Amer, Mohamed R},
year={2019}
}

Machine Learning Graph Neural Networks Deep Learning Computer Vision

Pytorch

About Help Terms Privacy

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 20/21
4/21/23, 3:50 PM Tutorial on Graph Neural Networks for Computer Vision and Beyond | by Boris Knyazev | Medium

Get the Medium app

https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d 21/21

(MIT Press Series in Computer Systems) Hideo Fujiwara-Logic Testing and Design For testability-MIT Press (1985) PDF
100% (1)
(MIT Press Series in Computer Systems) Hideo Fujiwara-Logic Testing and Design For testability-MIT Press (1985) PDF
293 pages
Graph Neural Networks in Action (MEAP Version 4) Keita Broadwater instant download
No ratings yet
Graph Neural Networks in Action (MEAP Version 4) Keita Broadwater instant download
46 pages
Graph Neural Networks in Action (MEAP Version 4) Keita Broadwater 2024 scribd download
100% (3)
Graph Neural Networks in Action (MEAP Version 4) Keita Broadwater 2024 scribd download
40 pages
94984400
No ratings yet
94984400
59 pages
Image and Vision Computing: Rama Chellappa
No ratings yet
Image and Vision Computing: Rama Chellappa
3 pages
Using Neural Networks for Image Classification
No ratings yet
Using Neural Networks for Image Classification
56 pages
Bilder in Bachelor Thesis
100% (3)
Bilder in Bachelor Thesis
6 pages
Deep+Learning+Approaches+to+Predict+Future+Frames+in+Videos
No ratings yet
Deep+Learning+Approaches+to+Predict+Future+Frames+in+Videos
17 pages
ImageGenerationwithGans basedTechniquesASurvey
No ratings yet
ImageGenerationwithGans basedTechniquesASurvey
19 pages
Where can buy Graph Neural Networks in Action (MEAP Version 4) Keita Broadwater ebook with cheap price
No ratings yet
Where can buy Graph Neural Networks in Action (MEAP Version 4) Keita Broadwater ebook with cheap price
40 pages
An Introduction To Image Classification (Klaus D Toennis) (Z-Library)
No ratings yet
An Introduction To Image Classification (Klaus D Toennis) (Z-Library)
297 pages
Graph Neural Networks For Visual Question Answering: A Systematic Review
No ratings yet
Graph Neural Networks For Visual Question Answering: A Systematic Review
38 pages
A Study On Effects of Data Augmentation in Detection
No ratings yet
A Study On Effects of Data Augmentation in Detection
13 pages
Thesis Report On Image Denoising
100% (3)
Thesis Report On Image Denoising
8 pages
Data Science Interview Questions 1
No ratings yet
Data Science Interview Questions 1
15 pages
Bachelor Thesis Informatik Thema
100% (2)
Bachelor Thesis Informatik Thema
8 pages
Thesis On Image Segmentation
No ratings yet
Thesis On Image Segmentation
4 pages
I5 RWTH Thesis
100% (3)
I5 RWTH Thesis
6 pages
Pattern Recognitionand Neural Networks
No ratings yet
Pattern Recognitionand Neural Networks
12 pages
A Review On Deep Learning Approaches To Image Classification and Object Segmentation 1
No ratings yet
A Review On Deep Learning Approaches To Image Classification and Object Segmentation 1
23 pages
Gcn
No ratings yet
Gcn
23 pages
A Guide to Self-Supervised Learning in Computer Vision
No ratings yet
A Guide to Self-Supervised Learning in Computer Vision
15 pages
Thesis Computer Vision
100% (3)
Thesis Computer Vision
4 pages
Convolutional Neural Networks CNN
No ratings yet
Convolutional Neural Networks CNN
8 pages
Computer Vision Literature Review
100% (1)
Computer Vision Literature Review
6 pages
Thesis Og
100% (3)
Thesis Og
4 pages
Understanding Semantic Segmentation With UNET - by Harshall Lamba - Towards Data Science
No ratings yet
Understanding Semantic Segmentation With UNET - by Harshall Lamba - Towards Data Science
33 pages
Major Project On: "Age and Gender Detection Master''
No ratings yet
Major Project On: "Age and Gender Detection Master''
28 pages
Sampath Et Al. - 2021 - A Survey On Generative Adversarial Networks For Im
No ratings yet
Sampath Et Al. - 2021 - A Survey On Generative Adversarial Networks For Im
60 pages
Thesis 2.1 Themes
100% (3)
Thesis 2.1 Themes
5 pages
Textually Enriched Neural Module Networks For Visual Question Answering
No ratings yet
Textually Enriched Neural Module Networks For Visual Question Answering
9 pages
Openface: A General-Purpose Face Recognition Library With Mobile Applications
No ratings yet
Openface: A General-Purpose Face Recognition Library With Mobile Applications
20 pages
PHD Thesis On Image Fusion
100% (3)
PHD Thesis On Image Fusion
5 pages
Fin Irjmets1655531403
No ratings yet
Fin Irjmets1655531403
6 pages
en
No ratings yet
en
13 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
20 pages
What is Graph Neural Network_ An Introduction to GNN and Its Applications _ Simplilearn
No ratings yet
What is Graph Neural Network_ An Introduction to GNN and Its Applications _ Simplilearn
13 pages
Thesis Artificial Neural Network
100% (3)
Thesis Artificial Neural Network
4 pages
Neural Network Thesis 2013
100% (1)
Neural Network Thesis 2013
5 pages
DL Segmentation 2
No ratings yet
DL Segmentation 2
18 pages
ImageCaptioningfortheVisuallyImpaired1
No ratings yet
ImageCaptioningfortheVisuallyImpaired1
6 pages
Neural Networks For Fingerprint Recognition
No ratings yet
Neural Networks For Fingerprint Recognition
17 pages
CS 760 Fall 2017: Example Final Project Topics: 1 Theory
No ratings yet
CS 760 Fall 2017: Example Final Project Topics: 1 Theory
5 pages
Image+Caption(1)
No ratings yet
Image+Caption(1)
8 pages
VGGFace Transfer Learning and Siamese Network For Face Recognition
No ratings yet
VGGFace Transfer Learning and Siamese Network For Face Recognition
6 pages
Face Mask Detection in Image and
No ratings yet
Face Mask Detection in Image and
20 pages
Learning To Compare Image Patches Via Convolutional Neural Networks
No ratings yet
Learning To Compare Image Patches Via Convolutional Neural Networks
9 pages
Human Face Recognition
No ratings yet
Human Face Recognition
8 pages
IJNRD2309143
No ratings yet
IJNRD2309143
11 pages
Convnets From Thesis
No ratings yet
Convnets From Thesis
9 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
2017 Beginner's Review of Generative Adversarial Networks (GAN) Architectures
No ratings yet
2017 Beginner's Review of Generative Adversarial Networks (GAN) Architectures
9 pages
Next-Generation AI For Connectomics
No ratings yet
Next-Generation AI For Connectomics
2 pages
An Improved Automatic Image Annotation Approach Using Convolutional Neural Network-Slantlet Transform
No ratings yet
An Improved Automatic Image Annotation Approach Using Convolutional Neural Network-Slantlet Transform
13 pages
Deep Learning in Next-Frame Prediction A Benchmark Review
No ratings yet
Deep Learning in Next-Frame Prediction A Benchmark Review
11 pages
Artificial Intelligence Image Recognition Method Based On Convolutional Neural Network Algorithm
No ratings yet
Artificial Intelligence Image Recognition Method Based On Convolutional Neural Network Algorithm
14 pages
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
No ratings yet
Why_are_Graph_Neural_Networks_Effective_for_EDA_Problems
8 pages
Table_Tennis_Capture_System_Based_on_Image_Recogni
No ratings yet
Table_Tennis_Capture_System_Based_on_Image_Recogni
11 pages
Without IEEE Logo
No ratings yet
Without IEEE Logo
7 pages
Generative adversarial network An overview of theory and applications
No ratings yet
Generative adversarial network An overview of theory and applications
9 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Mitigation of Second Harmonic Pulsations For Single-Phase Inverters
No ratings yet
Mitigation of Second Harmonic Pulsations For Single-Phase Inverters
5 pages
Connections With Dowels
No ratings yet
Connections With Dowels
5 pages
P2 Chp2 Section 2.1A
No ratings yet
P2 Chp2 Section 2.1A
3 pages
9701 Y25-27 Sy-Pages
No ratings yet
9701 Y25-27 Sy-Pages
4 pages
Chapter 3 Transmission Medias
No ratings yet
Chapter 3 Transmission Medias
9 pages
Full Download Multiphase Flows for Process Industries 2 Volume Set 1st Edition Vivek V Ranade Ranjeet P Utikar PDF DOCX
100% (2)
Full Download Multiphase Flows for Process Industries 2 Volume Set 1st Edition Vivek V Ranade Ranjeet P Utikar PDF DOCX
65 pages
Control Systems Laboratory Experiment 4 Open-Loop vs. Closed-Loop Control Systems
No ratings yet
Control Systems Laboratory Experiment 4 Open-Loop vs. Closed-Loop Control Systems
6 pages
Application Notes: Determination of The Flavonoid Rutin in Ginkgo Biloba Dry Extract by HPTLC
No ratings yet
Application Notes: Determination of The Flavonoid Rutin in Ginkgo Biloba Dry Extract by HPTLC
4 pages
Pcchips M825lu Manual
0% (1)
Pcchips M825lu Manual
3 pages
Static Huffman Coding Term Paper
No ratings yet
Static Huffman Coding Term Paper
23 pages
Leica Disto X2 User Manual
No ratings yet
Leica Disto X2 User Manual
6 pages
GS Recipe Library: 80% Cake and Cream Margarine Containing FA
No ratings yet
GS Recipe Library: 80% Cake and Cream Margarine Containing FA
2 pages
Module 5-FS
No ratings yet
Module 5-FS
21 pages
ATS O-Sung
No ratings yet
ATS O-Sung
28 pages
5124 w09 Ms 2
No ratings yet
5124 w09 Ms 2
4 pages
Oracle: Questions & Answers
0% (1)
Oracle: Questions & Answers
5 pages
Permanent Magnet Direct-Drive Shaft Generators in Marine Applications
100% (1)
Permanent Magnet Direct-Drive Shaft Generators in Marine Applications
6 pages
Complex Network Analysis of Global Stock Market Co-Movement During The COVID-19 Pandemic Based On Intraday Open-High-Low-Close Data
No ratings yet
Complex Network Analysis of Global Stock Market Co-Movement During The COVID-19 Pandemic Based On Intraday Open-High-Low-Close Data
50 pages
Medical Physics Guidelines-1
No ratings yet
Medical Physics Guidelines-1
11 pages
Critical Thinking Quiz
75% (4)
Critical Thinking Quiz
16 pages
Motor Current Signature
No ratings yet
Motor Current Signature
5 pages
Linear Algebra Via Exterior Products
No ratings yet
Linear Algebra Via Exterior Products
285 pages
Deterioration and Restoration of Concret PDF
No ratings yet
Deterioration and Restoration of Concret PDF
10 pages
Etabs v8 Concrete Shear Wall Design Manual PDF
No ratings yet
Etabs v8 Concrete Shear Wall Design Manual PDF
2 pages
Reactive Dye (Textile)
No ratings yet
Reactive Dye (Textile)
38 pages
Plastic Detection and Classification Using Deep Learning A PROJECT REPORT (Project Work I - Phase I)
No ratings yet
Plastic Detection and Classification Using Deep Learning A PROJECT REPORT (Project Work I - Phase I)
39 pages
Resume 4 Food Analysis - Group 5
No ratings yet
Resume 4 Food Analysis - Group 5
4 pages
Hey Jude Quarteto Cordas e Flauta Changes
No ratings yet
Hey Jude Quarteto Cordas e Flauta Changes
2 pages
Electrons in Atoms
No ratings yet
Electrons in Atoms
16 pages