09-hetero
09-hetero
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 3
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
2 types of nodes:
Node type A: Paper nodes
Node type B: Author nodes
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 5
2 types of edges:
Edge type A: Cite
Edge type B: Like
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 6
A graph could have multiple types of nodes and
edges! 2 types of nodes + 2 types of edges.
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 7
8 possible relation types!
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 10
Example: E-Commerce Graph
▪ Node types: User, Item, Query, Location, ...
▪ Edge types: Purchase, Visit, Guide, Search, …
▪ Different node type's features spaces can be different!
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 11
Example: Academic Graph
▪ Node types: Author, Paper, Venue, Field, ...
▪ Edge types: Publish, Cite, …
▪ Benchmark dataset: Microsoft Academic Graph
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 12
Observation: We can also treat types of
nodes and edges as features
▪ Example: Add a one-hot indicator for nodes and
edges
▪ Append feature [1, 0] to each “author node”; Append
feature [0, 1] to each “paper node”
▪ Similarly, we can assign edge features to edges with
different types
▪ Then, a heterogeneous graph reduces to a
standard graph
When do we need a heterogeneous graph?
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 13
When do we need a heterogeneous graph?
▪ Case 1: Different node/edge types have different
shapes of features
▪ An “author node” has 4-dim feature, a “paper node” has
5-dim feature
▪ Case 2: We know different relation types
represent different types of interactions
▪ (English, translate, French) and (English, translate,
Chinese) require different models
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 14
Ultimately, heterogeneous graph is a more
expressive graph representation
▪ Captures different types of interactions between
entities
But it also comes with costs
▪ More expensive (computation, storage)
▪ More complex implementation
There are many ways to convert a
heterogeneous graph to a standard graph
(that is, a homogeneous graph)
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 15
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Kipf and Welling. Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017
(1) Message
Aggregation
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 18
We will extend GCN to handle heterogeneous
graphs with multiple edge/relation types
We start with a directed graph with one relation
▪ How do we run GCN and update the representation of
the target node A on this graph?
B
Target Node
A
C
F
D E
Input Graph
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 19
We will extend GCN to handle heterogeneous
graphs with multiple edge/relation types
We start with a directed graph with one relation
▪ How do we run GCN and update the representation of
the target node A on this graph?
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 20
What if the graph has multiple relation types?
𝑟 B
Target node 1
𝑟3
A
C
𝑟1 𝑟2
𝑟3 𝑟2
F
D E 𝑟1
Input graph
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 21
What if the graph has multiple relation types?
Use different neural network weights for
different relation types.
Weights 𝐖𝑟1 for 𝑟1
𝑟 B
Target node 1
𝑟3
A
C Weights 𝐖𝑟2 for 𝑟2
𝑟1 𝑟2
𝑟3 𝑟2
F
D E 𝑟1 Weights 𝐖𝑟3 for 𝑟3
Input graph
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 22
What if the graph has multiple relation types?
Use different neural network weights for
different relation types! Aggregation
C
𝑟 B
Target node 1 B
𝑟3
A F
C
𝑟1 𝑟2 A C
𝑟3 𝑟2
F E
D E 𝑟1 D
Input graph
Neural networks
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 23
Introduce a set of neural networks for each
relation type!
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 24
Relational GCN (RGCN):
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 25
1 2 𝐿
Each relation has 𝐿 matrices: 𝐖𝑟 , 𝐖𝑟 ⋯ 𝐖𝑟
𝑙
The size of each 𝐖𝑟 is 𝑑(𝑙+1) × 𝑑(𝑙) 𝑑 is the hidden
dimension in layer 𝑙
(𝑙)
𝐖𝑟 =
Limitation: only nearby
neurons/dimensions
can interact through 𝑊
𝑟1 B
𝑟3
A
C
𝑟1 𝑟2
𝑟3 𝑟2
𝒓𝟑 F
D E 𝑟1
Input Graph
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 31
Training:
𝑟1 B 1. Use RGCN to score the training
𝑟3 supervision edge 𝑬, 𝒓𝟑 , 𝑨
A
C 2. Create a negative edge by perturbing
𝑟1 𝑟2
𝑟3 𝑟2 the supervision edge 𝑬, 𝒓𝟑 , 𝑩
𝒓𝟑 F • Corrupt the tail of 𝑬, 𝒓𝟑 , 𝑨
D E 𝑟1 • e.g., 𝑬, 𝒓𝟑 , 𝑩 , 𝑬, 𝒓𝟑 , 𝑫
Input Graph
𝜎 … Sigmoid function
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 33
Evaluation:
▪ Validation time as an example, same at the test time
𝑟1 B Evaluate how the model can predict the
𝑟3 validation edges with the relation types.
A
C Let’s predict validation edge 𝑬, 𝒓𝟑 , 𝑫
𝑟1 𝑟2
𝑟3 𝑟3 𝑟2 Intuition: the score of 𝑬, 𝒓𝟑 , 𝑫 should be
F higher than all 𝑬, 𝒓𝟑 , 𝒗 where 𝑬, 𝒓𝟑 , 𝒗 is NOT
D
𝒓𝟑 ?
E 𝑟1 in the training message edges and training
Input Graph supervision edges, e.g., 𝑬, 𝒓𝟑 , 𝑩
validation edges: 𝑬, 𝒓𝟑 , 𝑫
training message edges & training supervision
edges: all existing edges (solid lines)
Benchmark dataset
▪ ogbn-mag from Microsoft Academic Graph (MAG)
Four (4) types of entities
▪ Papers: 736k nodes
▪ Authors: 1.1m nodes
▪ Institutions: 9k nodes
▪ Fields of study: 60k nodes
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 36
Wang et al. Microsoft academic graph: When experts are not enough.Quantitative Science Studies 2020.
Benchmark dataset
▪ ogbn-mag from Microsoft Academic Graph (MAG)
Four (4) directed relations
▪ An author is "affiliated with" an institution
▪ An author "writes" a paper
▪ A paper "cites" a paper
▪ A paper "has a topic of" a field of study
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 37
Wang et al. Microsoft academic graph: When experts are not enough.Quantitative Science Studies 2020.
Prediction task
▪ Each paper has a 128-dimensional word2vec feature vector
▪ Given the content, references, authors, and author affiliations
from ogbn-mag, predict the venue of each paper
▪ 349-class classification problem due to 349 venues considered
Time-based dataset splitting
▪ Training set: papers published before 2018
▪ Test set: papers published after 2018
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 38
Wang et al. Microsoft academic graph: When experts are not enough.Quantitative Science Studies 2020.
Benchmark results:
SOTA
R-GCN
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 39
Relational GCN, a graph neural network for
heterogeneous graphs
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 45
Hu et al. Heterogeneous Graph Transformer. WWW 2020.
� Q-Linear
� � � � � [� ]
Write Cite
� � � [� ] � �� [� 1, � ]
�
� K-Linear
Paper
� � � � � [� ] …
…
� � � �� [� 2, � ]
K-Linear � � � [� ]
Author
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 46
Hu et al. Heterogeneous Graph Transformer. WWW 2020.
(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity
GNN Layer 2
Node 𝒗
(2) Aggregation
(1) Message
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 52
(1) Heterogeneous message computation
𝑙 𝑙−1 (𝑙)
▪ Message function: = MSG𝑟 𝐡𝑢 𝐦𝑢
▪ Observation: A node could receive multiple types of
messages. Num of message type = Num of relation
type
▪ Idea: Create a different message function for each
relation type
(𝑙) 𝑙 𝑙−1
▪ 𝐦𝑢 = MSG𝑟 𝐡𝑢 , 𝑟 = (𝑢, 𝑒, 𝑣) is the relation
type between node 𝑢 that sends the message, edge
type 𝑒 , and node 𝑣 that receive the message
(𝑙) 𝑙−1 𝑙
▪ Example: A Linear layer 𝐦𝑢 = 𝐖𝑟 𝐡𝑢
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 53
(2) Aggregation
▪ Intuition: Each node will aggregate the messages from
node 𝑣’s neighbors
(𝑙)
𝐡𝑣 = AGG 𝑙
𝐦𝑢𝑙 , 𝑢 ∈ 𝑁 𝑣
▪ Example: Sum(⋅), Mean(⋅) or Max(⋅) aggregator
𝑙 𝑙
▪ 𝐡𝑣 = Sum({𝐦𝑢 , 𝑢 ∈ 𝑁(𝑣)})
Node 𝒗
(2) Aggregation
(1) Message
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 54
(2) Heterogeneous Aggregation
▪ Observation: Each node could receive multiple types of
messages from its neighbors, and multiple neighbors
may belong to each message type.
▪ Idea: We can define a 2-stage message passing
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 56
Heterogeneous pre/post-process layers:
▪ MLP layers with respect to each node type
▪ Since the output of GNN are node embeddings
(𝑙) (𝑙)
▪ 𝐡𝑣 = MLP𝑇(𝑣) (𝐡𝑣 )
▪ 𝑇(𝑣) is the type of node 𝑣
Other successful GNN designs are
also encouraged for heterogeneous
GNNs: skip connections, batch/layer
normalization, …
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 57
Graph Feature manipulation
▪ The input graph lacks features → feature
augmentation
Graph Structure manipulation
▪ The graph is too sparse → Add virtual nodes / edges
▪ The graph is too dense → Sample neighbors when
doing message passing
▪ The graph is too large → Sample subgraphs to
compute embeddings
▪ Will cover later in lecture: Scaling up GNNs
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 58
Graph Feature manipulation
▪ 2 Common options: compute graph statistics (e.g.,
node degree) within each relation type, or across the
full graph (ignoring the relation types)
Graph Structure manipulation
▪ Neighbor and subgraph sampling are also common
for heterogeneous graphs.
▪ 2 Common options: sampling within each relation
type (ensure neighbors from each type are covered),
or sample across the full graph
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 59
Node-level prediction:
Edge-level prediction:
=
𝐿 𝐿
Linear(Concat(𝐡𝑢 , 𝐡𝑣 ))
Graph-level prediction:
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 60
Node-level prediction:
Edge-level prediction:
=
𝐿 𝐿
Linear𝑟 (Concat(𝐡𝑢 , 𝐡𝑣 ))
Graph-level prediction:
ℝ𝑑 , ∀𝑇 𝑣 = 𝑖}))
2/16/2023 Jure Les kovec, Stanford CS224W: Ma chine Learning with Graphs, http://cs224w.stanford.edu 61
J. You, R. Ying, J. Leskovec. Design Space of Graph Neural Networks, NeurIPS 2020
(2) Aggregation
GNN Layer 1
(1) Message
(3) Layer
connectivity
GNN Layer 2