0% found this document useful (0 votes)
28 views

GML 1

Uploaded by

asmachaoui0403
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
28 views

GML 1

Uploaded by

asmachaoui0403
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 32
Traditional Feature-based Methods Traditional Feature-based Methods Graph Representation Graphs : A Common Language c - end i ey / rather) fn Actor 3) Albert Ces Protein 4 Cae @ Protein SO RO \ / ~o Protein 90) INI=4 / lela & Traditional Feature-based Methods Graph Representation How to build a graph m= What are nodes ? = What are edges? The choice of the proper network representation of a given domain/problem determines our ability to use networks successfully = In some cases, there is a unique, unambiguous representation = In other cases, the representation is by no means unique The way you assign links will determine the nature of the question you can study Traditional Feature-based Methods Graph properties Undirected vs Directed = Links: undirected = Links: directed (symmetrical, reciprocal) (arcs) ‘er = Examples: = Examples: " Collaborations = Phone calls ® Friendship on Facebook = Following on Twitter Traditional Feature-based Methods Graph properties Node degrees Node degree, k;: the number 3 of edges adjacent to node i 2 = k,=4 5 nus ne wg. degree: k=(k)=— g. deg Waa N a1 ! In directed networks we define an in-degree and out-degree. The (total) degree of a node is the sum of in- and out-degrees. Kad ket =1 Directed 3 Source:Nodewith =o = F_ Sink: Node with k=0 N Traditional Feature-based Methods Graph properties Bipartite graph = A bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets = Examples: = Authors-to-Papers = Actors-to-Movies = Users-to-Movies Traditional Feature-based Methods Graph properties Bipartite graph . “ose Bipartite Graph Projection U Projection V Sah Traditional Feature-based Methods Graph properties Adjacency Matrix Ajj= 1 if thereis a link from node ito node j Aj=O otherwise o 1041 oo O01 f 00 i ef 00 Sparse 0001 ooo00 ae a) o110 Note that for a directed graph (right) the matrix is not symmetric. Traditional Feature-based Methods Graph properties Adjacency Matrix Most real-world networks are sparse E<< E... (or k<< N-1) ‘max NerwoR nNoDes ts ieecreo/ N t Unbinecreo aan | oso | 267 zesos | sues | 2st mas | 29,397,908] 23.72 ager | asaoare | 1048 1039 saz | sse Traditional Feature-based Methods Graph properties Edge List (representation) * (2,3) = (2,4) = (3, 2) eo ——— =(,4) = (4,5) = (5,2) * (5,1) Traditional Feature-based Methods Graph properties Adjacency List (representation) Easier to work with if network is "= Large fp = Sparse ® Allows us to quickly retrieve all neighbors of a given node at = 2:3,4 =3:2,4 "4:5 *5:1,2 Traditional Feature-based Methods Graph properties Node and Edge Attributes Weight (e.g,, frequency of communication) Ranking (best friend, second best friend...) = Type (friend, relative, co-worker) = Sign : friend vs Foe, Trust vs Distrust = Properties depending on the structure of the rest of the graph: Number of common friends Traditional Feature-based Methods Graph properties Node and Edge Attributes = Unweighted = Weighted (undirected) (undirected) 0 2 05 0 2 0 1 4 Ay= os 1 0 0 0 4 0 oOo) A,=0 E=LY nonzero(4,) Examples: Friendship, Hyperink Examples: Collaboration, Internet, Roads Traditional Feature-based Methods Graph properties Self-loops and Multigraph = Self-edges (self-loops) = Multigraph (undirected) (undirected) 1 1 0 0 lo 3 0 oJ Snonzero(A,) Examples: Proteins, Hyperlinks Examples: Communication, Collaboration Traditional Feature-based Methods Graph properties Connectivity of undirected graph = A graph is connected if any two nodes can be joined by a path = A disconnected graph is made up by two or more connected components 8 Fr Largest Component: Giant Component Isolated node (node 4) Traditional Feature-based Methods Graph properties Connectivity of undirected graph = The adjacency matrix of a network with several components can be written in a block-diagonal form, so that nonzero elements are confined to squares, with all other elements being zero g a oe a oo opm oo 0c Z ae z Meio z >. oa . Traditional Feature-based Methods Graph properties Connectivity of directed graph = Strongly connected : has a path from each node to every other node and vice versa & Weakly connected Graph on the left is connected but not strongly connected (e.g there is no way to get from F to G by following the edge directions) Traditional Feature-based Methods Machine Learning Tasks = Node-level prediction = Link-level prediction = Graph-level prediction Graph-level a Traditional Feature-based Methods Machine Learning Pipeline = Design features for nodes/links/graphs = Obtain features for all training data eR? ER ee oe See eR 3 features < “Graph features Traditional Feature-based Methods Machine Learning Pipeline = Trainan ML model: = Apply the model: * Random forest = Given a new = SVM node/link/graph, obtain = Neural network, etc. its features and make a prediction x > v1 , ‘ x —y xy Pw Traditional Feature-based Methods Feature Design Using effective features over graphs is the key to achieve good model performance In this lecture, we overview the traditional features for : ® Node-level prediction & Link-level prediction = Graph-level prediction For simplicity, we focus on undirected graphs. Traditional Feature-based Methods Node Features Node-Level Tasks > 2 ee, Pa e® ®@ Mact -e a a @ e@°@ Node classification Traditional Feature-based Methods Node Features Goal Characterize the structure and position of a node in the network Overview Node feature Node degree Node centrality Clustering coefficient Graphlets Traditional Feature-based Methods Node Features : Node degree = The degree k,, of node v is the number of edges (neighboring nodes) the node has. = Treats all neighboring nodes equally. Traditional Feature-based Methods Node Features : Node centrality = Node degree counts the neighboring nodes without capturing their importance. = Node centrality c, takes the node importance in a graph into account = Different ways to model importance: = Eigenvector centrality = Betweenness centrality = Closeness centrality = and many others... Traditional Feature-based Methods Node Features : Node centrality Eigenvector centrality = Anode v is important if surrounded by important neighboring nodes u € N(v). = We model the centrality of node v as the sum of the centrality of neighboring nodes: 1 =z eq u€N(v) 2 is normalization constant (it will turn out to be the largest eigenvalue of A) = Notice that the above equation models centrality in a recursive manner. How do we solve it? Traditional Feature-based Methods Node Features : Node centrality Eigenvector centrality = Rewrite the recursive equation in the matrix form. Ti «7 Du em de = Ac uEN(Y) + A: Adjacency matrix J is normalization const Aw= lifueN() + ¢: Centrality vector * Q: Eigenvalue = We see that centrality c is the eigenvector of A! = The largest eigenvalue A,,,,. is always positive and unique (by Perron-Frobenius Theorem). = The eigenvector C,,_, corresponding to Ajay is used for centrality. (largest eigenvalue of A) Traditional Feature-based Methods Node Features : Node centrality Betweenness centrality = Anode is important if it lies on many shortest paths between other nodes. #(shortest paths betwen s and t that contain v) = (shortest paths between s and t) sepet = Example: (A-C-B, ACD, A-C-D-E) op=8 (A-C-D-E, B-D-E, C-D-E) Traditional Feature-based Methods Node Features : Node centrality Closeness centrality = Anode is important if it has small shortest path lengths to all other nodes. 1 “= Dusv Shortest path length between u and v = Example: y= 1/(2+14+24+3)=1/8 (A-C-B, A-C, A-C-D, A-C-D-E) ¢p =1/2+14+14+1)=1/5 (D-C-A, D-B, D-C, D-E) Traditional Feature-based Methods Node Features : Clustering coefficient = Measures how connected v’s neighboring nodes are: #(edges among neighboring nodes e, = (edg B nei 8 2 € [0,1] 2 #(node pairs among k,, neighboring nodes) = Examples: In our examples below the denominator is 6 (4 choose 2). e=1 ey = 0.5 e,=0 Traditional Feature-based Methods Node Features : Graphlets = Observation: Clustering coefficient counts the #(triangles) in the “nee St IMI Noonan triangles (out of 6 node triplets) = We can generalize the above by counting #(pre-specified subgraphs, i.e., graphlets). Traditional Feature-based Methods Node Features : Graphlets Graphlets: Rooted connected non-isomorphic subgraphs: aoe Seema sieges TIA TANT?A 8 getewig WAAT LAT TSE G Go Gi Go Gs Ga Gs Ge Go Gs Gy SASL Cig There are 73 different graphlets on up to 5 nodes Traditional Feature-based Methods Node Features : Graphlets = Degree counts #(edges) that a node touches = Clustering coefficient counts #(triangles) that a node touches. = Graphlet Degree Vector (GDV): Graphlet-base features for nodes = GDV counts #(graphlets) that a node touches Traditional Feature-based Methods Node Features : Graphlets . Example: Possible graphlets up to size 3 G « » ¢ «TAA a Graphlet instances of node u: d a b & qe q- qe GDV of node u: waved oe Bis Traditional Feature-based Methods Node Features : Graphlets = Considering graphlets on 2 to 5 nodes we get: * Vector of 73 coordinates is a signature of a node that describes the topology of node's neighborhood = Captures its interconnectivities out to a distance of 4 hops = Graphlet degree vector provides a measure of a node’s local network topology: = Comparing vectors of two nodes provides a more detailed measure of local topological similarity than node degrees or clustering coefficient. Traditional Feature-based Methods Link Features = The task is to predict new links based on the existing links. = At test time, node pairs (with no existing links) are ranked, and top K node pairs are predicted. = The key is to design features for a pair of nodes. Traditional Feature-based Methods Link Features Two formulations of the link prediction task: = 1) Links missing at random: = Remove a random set of links and then aim to predict them = 2) Links over time: = Given G[tg, tj] a graph defined by edges up to time fo, output a ranked list L of edges (not in G[to, to]) that are predicted to appear in time G[t,, tt] at alta Evaluation: ® n= |Eyjoy|: # new edges that appear during the test period [¢,, t;] = Take top 7 elements of L and count correct edges Traditional Feature-based Methods Link Features = Methodology: = For each pair of nodes (x,y) compute score c(x,y) = For example, c(x,y) could be the # of common neighbors of xandy = Sort pairs (x,y) by the decreasing score c(x,y) = Predict top n pairs as new links = See which of these links actually appear in G[t,, tj] Traditional Feature-based Methods Link Features Overview = Distance-based feature = Local neighborhood overlap = Global neighborhood overlap Link feature...... Traditional Feature-based Methods Link Features : distance-based feature Shortest-path distance between two nodes = Example: Seu = Spe = Sap = 2 Seg = Spr = 3 = However, this does not capture the degree of neighborhood overlap: = Node pair (B, H) has 2 shared neighboring nodes, while pairs (B, E) and (A, B) only have 1 such node. Traditional Feature-based Methods Link Features : local neighborhood overlap Captures # neighboring nodes shared between two nodes v, and v2: = Common neighbors: |N(v,) NN(v,)| = Example: [N(A) 9 N(B)| = |{C}] = 1 oe NI iN = Jaccard’s coefficient; Mever@a! IN(@y)UN(2)| © Example: M@ON@! _ WC _ 4 PrtiNC@UN) IED) 2 Adamic-Adar index: 1 Zuew(v, IW) ogee fd log(ke) log 4 = Example: Traditional Feature-based Methods Link Features : local neighborhood overlap = Limitation of local neighborhood features: = Metric is always zero if the two nodes do not have any neighbors in common. = However, the two nodes may still potentially be connected in the future. = Global neighborhood overlap metrics resolve the limitation by considering the entire graph. Traditional Feature-based Methods Link Features: global neighborhood overlap = Katz index: count the number of walks of all lengths between a pair of nodes. = How to compute #walks between two nodes? = Use adjacency matrix powers! = Ay, specifies #walks of length 1 (direct neighborhood) between wu and v. . Aes specifies #walks of length 2 (neighbor of neighbor) between wand v. = And, Aly specifies #walks of length L. Traditional Feature-based Methods Link Features: global neighborhood overlap Powers of adjacence matrices = Computing #walks between two nodes = Recall: A, = 1 if u € N(v) = Let po = #walks of length K between u and v = We will show P™) = ak . po = #walks of length 1 (direct neighborhood) between wand v = Ayy PY =A, > I Bors er oReER Traditional Feature-based Methods Link Features: global neighborhood overlap Powers of adjacence matrices = How to compute pe) ? = Step 1: Compute #walks of length 1 between each of u’s neighbor and v = Step 2: Sum up these #walks across u's neighbors 2) 1 * Phe = Ladue * Ply) = Li Aue * Aw uy ‘valks of length 1 between Node 1's neighbors and Nod A? = Power of adjacency Traditional Feature-based Methods Link Features: global neighborhood overlap = Katz index between v, and 1 is calculated as Sum over all walk lengths #walks of length U Se = > [6] between 1 and v, '=1 9

You might also like