CS514 2024fall Midterm
CS514 2024fall Midterm
Midterm Exam
IMPORTANT Notes
• Please directly write down your solutions under every problem with brief explanations if
necessary.
• The exam starts at 9:30 am and ends at 10:45 am. Feel free to skip some ‘hard’ problems and
distribute your time wisely.
1 2 3 4 5 6 Total
1
Problem 1. (22 pts) Random Walk with Restart. Given an undirected graph as Figure 1
shows. We want to apply the power iteration (“OnTheFly”) method of random walk with restart
to calculate proximity. Power iteration can be described as r ← (1 − c) · W̃r + c · e, where c ∈ (0, 1)
is the restart probability and e is the starting vector.
1. (9 pts) What is its adjacency matrix? What is its degree matrix? What is its normalized
adjacency matrix with symmetric normalization?
2. (5 pts) If the random walk probability from one node to another is evenly split based on the
degree of the source node, how should we appropriately normalize the adjacency matrix to
describe such a case? Show the normalized adjacency matrix.
3. (3 pts) If we want to measure the proximity between node 1 and other nodes, how should we
initialize the ranking vector r?
2
4. (5 pts) If we set restart probability as 0 (c = 0), is the convergence of power iteration guar-
anteed for any row normalized adjacency matrix? If so, prove the convergence. If not, give
a row normalized adjacency matrix and initialization of r so that power iteration with c = 0
will not converge.
3
Problem 2. (18 pts) Matrix Low-Rank Factorization.
1. (2 pts) Among SVD, CMD and Colibri-S, which one gives the best approximation if the
squared error is measured?
3. (10 pts) For the same matrix in Problem 2.2, if we apply Colibri-S instead of SVD with
sampled indices I = {0,1,2} and threshold ϵ = 0.5, what are the matrices L, M, and R so
that A ≈ LMR is the low rank decomposition of A? What is the squared reconstruction
error?
4
Problem 3. (10 pts) Tensor Tools. Given the following tensor A.
5
Problem 4. (16 pts) Large-scale Information Network Embedding (LINE).
1. (6 pts) LINE defines the objective function of first-order proximity as:
X
O1 = − wij log p1 (i, j)
(i,j)∈E
1
p1 (i, j) =
1 + exp(−ui · uj )
where ui and uj are the embeddings for node i and j, respectively. What is the trivial solution
of 1-dimensional embeddings u by minimizing O1 ? If we want to avoid this trivial solution,
what techniques can we apply (give one of them)?
6
Problem 5. (16 pts) Graph Convolutional Network (GCN). Given an undirected graph as
Figure 3 shows.
2. (10 pts) The given graph has one-dimensional node features X = [0, 1, 0, 1]. A one-layer
GCN with no activations is defined as
1 1
Z = D̃− 2 ÃD̃− 2 Xθ,
where Z is the output. For node labels Y = [0, 1, 0, 1], what is the optimized θ to minimize
mean squared error?
7
Problem 6. (18 pts) OddBall. According to the paper Akoglu, Leman, Mary McGlohon, and
Christos Faloutsos. “Oddball: Spotting anomalies in weighted graphs.” PAKDD 2010, answer the
following questions.
1. (3 pts) For an undirected, weighted graph with the following adjacency matrix, how many
nodes and edges are there respectively in the ego-net of node 1 (index starts from 0)?
0 0 1 2
0 0 1 2
1 1 0 2
2 2 2 0
2. (10 pts) For a node’s undirected, unweighted ego-net, assume there are no connections between
the node’s neighbors, what kind of anomaly type is it? Find the relationship between λ and
N for this ego-net, where λ is the principal eigenvalue of the weighted adjacency matrix of
ego-net, N is the number of neighbors of the node.
3. (5 pts) What is the anomaly type of the graph with the following adjacency matrix (the center
node’s index is 0)? Explain why.
0 1 1000 1
1 0 1 0
1000 1 0 1
1 0 1 0
8
9
Blank Page
10