GraphBasedDataScience
GraphBasedDataScience
Science
Opportunities, challenges and techniques
Outline
• Graphs for Data Representation
• Overview of Graph Based Data Science
• Graph Analysis Algorithms
• Graph Aware ML/AI techniques
• Graph Kernels
• Graph Embeddings
• Graph Neural Networks
• Use Cases
• Image Processing29
• Entity Resolution/deduplication
• Cybersecurity30
• Fraud Detection
• etc.
Graph Algorithms
Community Detection11
• Algorithms:
• Louvain6
• Label Propagation7
• many more8
• Use Cases:
• Social networks analysis: e.g., extracting topics from online social platforms 9
• Criminology: identification of groups in criminal networks
• Public Health: discover dynamics of certain groups susceptible to an epidemic
disease
• Customer Segmentation: identify groups of people that are using the same
financial service
• Bioinformatics: investigating the human brain and finding hierarchical
community structures within the brain’s functional network 10
•
Graph Algorithms
Node Similarity
Based on:
• Node’s own features
• Node’s own and neighbors’ ID/features
• Similar neighbors (recursive definition)
• Graph topology
• Any combination of the above
Use cases:
• Entity matching/resolution31
• Recommendations (e.g., ‘Customers who brought this item also
brought..’)
• etc.
Graph Algorithms
Centrality11
• Variations:
• Degree Centrality: nodes with lots of direct connections
• Closeness Centrality: nodes that can reach other nodes with few hops
• Betweenness Centrality: nodes that sit on the shortest path of lots of
pairs of other nodes
• Eigenvector Centrality: nodes that are transitively connected to other
important nodes
• Use Cases:
• Identification of social media influencers12,13
• Fighting fraud and terrorism14,15,16
• Predicting flow in traffic, delivery and telecommunications systems 17,18
• etc.
Graph Algorithms
Path Finding5
• Algorithms
• Shortest Path
• A*
• Yen’s k-shortest paths
• All pairs shortest path
• Single source shortest path
• Minimum Spanning Tree
• Random Walks
• Use cases
• Directions between two physical locations
• Degrees of separation between people
• Travel Planning20, 21
• Financial Analysis19
• Graph Embeddings**
• etc.
Graph Aware ML/AI
Techniques
Graph Kernels32
Graph Kernels can be intuitively understood as functions
quantifying the similarity of pairs of graphs.
• ML/AI
• Allow kernelized ML algorithms (e.g., SVMs) to work directly on graphs
• Can be employed to learn node embeddings41**
• Node embeddings
• Whole graph embeddings [23]
Node Embeddings24
Each node is represented with a continuous vector of a
fixed dimensionality. They are used to perform
visualization or prediction at the node level, e.g., link
prediction.
Popular techniques:
• Random Walk Based Approaches:
• DeepWalk42 : uses random walks to produce embeddings
• node2vec 43 : modification of DeepWalk that provides control over the
explored neighborhoods
• Deep Learning Approaches
• GraphSAGE 45: an inductive scalable technique
• Graph Neural Networks**
Node Embeddings24
DeepWalk42
• Context construction
• Information Aggregation
Node Embeddings24, 47, 48
GraphSAGE45
• Context construction
• the neighborhood of a node consists of the nodes up to K hopes away form
that node
• Assumption: nodes that reside in the same neighborhood should have similar
embeddings
Node Embeddings24, 47, 48, 49
GraphSAGE45
• Information Aggregation
• instead of learning node embeddings GraphSAGE learns
aggregation functions
• aggregation functions accept a neighborhood as input and
combine each neighbor’s embedding with weights to create a
neighborhood embedding:
• LSTM on a random permutation of the nodes in a neighborhood
• average
• MLP followed by max pooling
Whole Graph Embeddings24
The entire graph is represented with a single vector. They
are used to make predictions at the graph level or to
compare or visualize graphs, e.g., in the comparison of
chemical structures.
• ML/AI
• Node classification
• Link Prediction