Graph Neural Nets
Graph Neural Nets
tasks of ONNs
① Node level Prediction : Structure
graph or
subgraphs drug discovery
,
physics simulations
*
adjacency matrix is
extremely sparse .
Connectivity
an undirected connected
graph is one that
has at least one
path between
Alphafold's key idea :
Spatial Graph nodes .
each two
proximity
between amino
-
if we
a
graph whose nodes can be divided into two ignore the direction of edges.
sets and t such that Van U are two
StronglyConnectedComponent
(SCC)
independent and
only ↑
DO
sets interact with
o
nodes of the other set
. 20 -
-
o -0
Traditional ML Pipeline
train an ML model >
-
feet new
graphy note >
-
obtain features -> predict
-
?
given G (V E) = ,
, learn function F:
V-IR
eigenvalue centrality is the weighted sum of
characterizing the properties of anode
neighbor centrality
. This is a recursive definition .
indistinguishable e
Etijj 4V ;
.
into
,
=x = A
the graphlet known as
graphlet Degree Vector (GDV)
I Some constant
A:
adjacency matrix
C
Centrality Vector
:
Betweenness
Centrality
high value if it lies
many shortest paths
on
Clustering
Coefficient
measure how connected us
neighbors are
Edge (link-level) Graph-level
based on
existing links , rank all note Kernel RIO , OE measures
pairs and select top K pairs (Social
.
1 .
links
missing at random Kernel must be positive semidefinite
.
2 Links over time a
type of Kernel is Graphlet Kernel
the
ranking depends on our
scoring system :
and there are other
types as well
Local
Neighborhood Overlap Some Book Notes
number of of the two notes .
common
neighbors
RN(V)/
Jaccard's Coefficient
IN(u)uN(V))
iftwo nodes aren't connected this would
always be .
0
L
Global Neighborhood overlap
So in traditional ML for graphs
Theorem Summary ,
the number of paths of length N between two notes looks like this :
is pl or powers of the
adjacency matrix
.
3
we use Katz
Feature engineering
edge level)
*
(node , graph , or
automatically ?
↓
Nose Embeddings nojezve
Similar to Random walk but has a
view
aocal microscopic
Similarity in the
embedding space must approximate BFS : gives L
view
defines Global microscopic
Similarity in the graph . this our goal :
DFS gives
:
a
2 .
In-out :
2
Random Walk W
-S
1
>
- normalize fip a coin !
given
a
graph and 119
a
starting point , we
visit
I number of nodes then S4
119
.
g
nobezvec perform oa
objective :
max[log P(p(u)l
NR a
is the
neighborhood of
by strategy R
we want to learn representations that
predict which noodes to find in RW .
&
Graph Emb g
iteration
pageRank is solved
using power
rit
+
)
.
2 Herate rt
a simple idea is to
just average over
= M .
.
3 when (r(t
Stop (H)
embeddings of notes graph E
<
the in a -
training set
there are more advanced methods (later)
Matrix embettings .
PageRank
the
web
collection
connected
"Can we
idea behind
rank
link
Google
webpages" ?
was a
3. 3
webpage --
links
>
>
-
Note
edge
Solution .
Learning
This is
let page I
solved by Stochastic
out-links,
Adjacency Marix
Neural Networks
have
dj
if
ji , then
Mij =
Edj
columns of 1 Sum
up to 1
M
ML with Graphs
an example is
classifying nodes in a
so the basic
message passing
can
3
.
Similar to
PageRank , labels of a note are influences
in Graphs
by their
neighbors.
a naive approach is to
feet the
homophily
feed forward
individual
>socialations
-
adjacency matrix A to a
characteristics network .
te
influence
the problem
one
way is collective classification where
us
message ↑
* *
↑
o Graph Convolutional
Network (GCN)
different algorithms differ in how they
and what messages they
aggregate messages
pass between nodes .
Graph Attention
1) Message Computation
we keep
weights of relationship between
nodewand its
neighbors u
.
m = MSO(h 1) -
mi =
W(ln(
2) Aggregation must be order-invariant
should matter .
as order of input nodes not
examples :
Sum() , Mean( ) .
. Max ()
compute message
from note I as well ,
using a
hi Concat(ad(mi , venus) , mi
domains
So in Contrast with other ,
adding
SNN does yield
layers better
more to a not
I
linear
↓
BatchNorm
transformer ↓
Dropout
↓
Activation
↓
Attention
↓
Graph
Aggregation
d Augmentation
a standard way of building ON from a
layer in (sparse/densel graphs the
many cases ,
is suitable for
to
simply stack them on each other . Simple input graph may
not be
of over-smoothing Computation
if many
.
the problem
layers are stacked
of the notes converge .
embeddings
Augmentation
occurs where the -
reature
Receptive Field
Sometimes the input graph is just als
of that determine the
modes lack any features.
the set modes
number of matrix and the
note the
of as
embedding
a
of add unique one-not its to notes or
the receptive field Solution :
Structural Aug
virtual (like
2-hop neighbor
&
adding .
nodes to a sparse graph
node
*
if Graph isJense , sample
neighborhood .
traininga
factor
a key in the expressiveness of ONNs
is the
aggregation function
being
able to
graphs .
functions evaluation
other things , such as loss ,
the most
expressive ONN is the one
which injective
:
uses an
in
↑
general .
Train/Test Split in
Graphs :
something special about
GNNs ?
if a
graph neural net has no note emb
indistinguishable e.
to emb(t)
TransE Generica
&CN agraph
htret
where
every
relation is if fact is true ,
note and
Scoring Function eher--1
labeled by a
type
it is able to
(biomedical or event
graphs)
not capture
symmetrical relations .
TransR
transform from
uses M. projection mat to
entity
space to relation space and then perform TransE .
does NOT
Support
the
aggregation composition
relations .
is Similar to
mother's
my
max
pooling husband is
my
father
thought ofas
to score
embeddings It can be
cosine Similarity between hir and .
this
the
means
GNN : WH W
rapid growth
....
of
WH
parameters
(overing)
- fr(hr , te) > fr (n re + 1)
1.
Block Diagonal Mat :
make the weights matrix
& ra(h reit) , = fr China , H
i
.
sparse
also can T model Composition Relations
ComplEX
represent Wr linear combination
WeightSharing based DistMul but embeds into complex
as
.
2 : a
on
a ses
basic vector space .
Wr = aroun
- basic matrix
relations
so
·
>
- importance weights
i is
Conjugate
knowledge Graphs real
part
the
of K
.
note has degree
a
randomly Chosen a
the Kronecker product takes cell of M1 and
every
-
multiplies it
by the whole M2 matrix
.
1234
k1 * k1 * k1 * k1
2
Deep Generative Models
3) Path Length (diameter) the maximum
for Graphs
:
or
argmaxElog Peel !
I dea Chain Rule.
:
we want to model a complex
S
&
also bus to
wecan generate grapa
use
Graph RNN
generating
a
graph sequentially by
adding
nodes and edges .
-
-
...
03
this has become a
sequencing problem- > RNN
to
any/all previous notes. the model must remember
So we use BFS
ordering !
↑
Scaling
But the computation graph grows exponentially
up especially if
↓
we hit a hub node .
much computation
it takes so long and
via
neighborhood aggregation.
* & Hesamation