0% found this document useful (0 votes)

23 views

Seminars in bio lecture6 2022 Graphنينااااا

Uploaded by

ai241234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Seminars in bio lecture6 2022 Graphنينااااا

Uploaded by

ai241234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Lecture 6: Seminars in

Bioinformatics:
Graph Mining
Prof. Dr. Taysir Hassan A. Soliman
Information Systems Department
Faculty of Computers and Information
Assiut University
[email protected]
November 27, 2022

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 1

Applications of Graph Mining
• Graphs become increasingly important in modeling complicated structures, such as:
1. Chemical compounds… (whether a particular substructure exists in a chemical compound or not)
2. Protein structures (whether a specific protein structure exists)
3. Biological networks
4. Social networks … detection of friendship, detection of communities and influence between users
5. The Web
6. workflows
7. XML documents

• Can analyze the properties of a real world graph

• Predict how the structure and properties of a real graph might affect some application
• Develop models that can generate realistic graphs that match patterns found in real
graphs.
31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 2
Applications of Graph Mining (Cont…)
• There have been studies on the use of frequent structures:
✓As features to classify chemical compounds
✓To study protein structural families
✓On the detection of considerably large frequent subpathways in
metabolic networks
✓On the use of frequent graph patterns for graph indexing and
similarity search in graph databases.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 3

Applications (Cont…)
• Find frequent subgraphs Within a graph itself
• Find frequent subgraphs between several graphs
• They are useful for
• characterizing graph sets,
• discriminating different groups of graphs,
• classifying and clustering graphs,
• building graph indices, and facilitating similarity search in graph
databases.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 4

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 5
Basic Terminologies:
• Directed vs Undirected graphs
• Weighted vs Unweighted graphs
• Rooted vs unrooted … the graph has a root ( a main node) to start at
• NP-complete problem: any of a class of computational problems for which
no efficient solution algorithm has been found. Many significant computer-
science problems belong to this class—e.g., the traveling salesman problem,
satisfiability problems, and graph-covering problems.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 6

UnDirected Graph Rooted Graph
Directed Graph

Weighted Graph

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 7

Frequent Graph Mining
• Among the various kinds of graph patterns, frequent substructures
are the very basic patterns that can be discovered in a collection of
graphs.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 8

Important Notations & Definitions:
• A graph g
• A vertex set of a graph g by V(g)
• The edge set by E(g).
• A label function, L, maps a vertex or an edge to a label.
• A graph g is a subgraph of another graph g0 if there exists a subgraph
isomorphism from g to g0.
• Given a labeled graph dataset, D = {G1,G2,...,Gn}, we define support(g) (or
frequency(g)) as the percentage (or number) of graphs in D where g is a
subgraph.
• A frequent graph is a graph whose support is no less than a minimum
support threshold, min sup.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 9

How can we discover frequent substructures?
The discovery of frequent substructures usually consists of two steps:
• In the first step, we generate frequent substructure candidates.
• In the second step: The frequency of each candidate is checked

• Most studies on frequent substructure discovery focus on the optimization of the

first step, because the
second step involves a subgraph isomorphism test whose computational
complexity is excessively high (i.e., NP-complete)

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 10

Example

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 11

We want to see whether
Does this subgraph exist in this graph
dataset ???

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 12

Does this subgraph exist in this graph
dataset ???

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 13

Approaches: Apriori-based Approach
• There are two basic approaches to this problem: an Apriori-based approach and a
pattern-growth approach.
• Apriori-based frequent substructure mining algorithms share similar
characteristics with Apriori-based frequent itemset mining algorithms
• The search for frequent graphs starts with graphs of small “size,” and proceeds in
a bottom-up manner by generating candidates having an extra vertex, edge, or
path.
• The definition of graph size depends on the algorithm used.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 14

AprioriGraph Algorithm

• Sk is the frequent substructure set of size k. 1. AprioriGraph adopts a level-wise

mining methodology.
2. At each iteration, the size of newly
discovered frequent substructures is
increased by one.
3. These new substructures are first
generated by joining two similar but
slightly different frequent subgraphs
that were discovered in the previous
call to AprioriGraph.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 15

Other Apriori-based Graph Mining Algorithms
• Apriori-based algorithms for frequent substructure mining include AGM, FSG, and a path-join method.
• AGM shares similar characteristics with Apriori-based itemset mining.
• FSG and the path-join method explore edges and connections in an Apriori-based fashion.

• Each of these methods explores various candidate generation strategies.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 16

AGM Algorithm
• Uses a vertex-based candidate generation method that increases the substructure size by one vertex at each
iteration of AprioriGraph.
• Two size-k frequent graphs are joined only if they have the same size-(k - 1) subgraph.
• Here, graph size is the number of vertices in the graph.
• The newly formed candidate includes the size-(k - 1) subgraph in common and the additional two vertices
from the two sizek patterns.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 17

FSG (Frequent Subgraph Mining) Algorithm
• FSG adopts an edge-based candidate generation strategy that increases the substructure size by one edge in
each call of AprioriGraph.
• Two size-k patterns are merged if and only if they share the same subgraph having k - 1 edges, which is
called the core.
• Here, graph size is taken to be the number of edges in the graph.

• The newly formed candidate includes the core and the additional two edges from the size-k patterns.
Each candidate has one more edge than

these two patterns.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 18

Edge-disjoint Path Method

• Graphs are classified by the number of disjoint paths they have

• Two paths are edge-disjoint if they do not share any common edge.

• A substructure pattern with k +1 disjoint paths is generated by joining substructures with k disjoint paths.

Disadvantages of Apriori-based approaches:

• Considerable overhead when joining two size-k frequent substructures to generate size-(k+1) graph candidates

• Uses Breadth-First Search for candidate generation approach (to determine whether a size-(k +1) graph
is frequent, it must check all of its corresponding size-k subgraphs to obtain an upper bound of its frequency).

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 19

Pattern-growth approach
• Pattern-growth approach uses breadth-first search as well as depth-first search
(DFS), the latter of which consumes less memory.
• The gSpan algorithm is designed to reduce the generation of duplicate graphs.
• It need not search previously discovered frequent graphs for duplicate detection.
• It does not extend any duplicate graph, yet still guarantees the discovery of the
complete set of frequent graphs.
• Adopts depth-first search.
• A starting vertex is randomly chosen and the vertices in a graph are marked so
that we can tell which vertices have been visited.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 20

.
• The visited vertex set is expanded repeatedly until a full depth-first
search (DFS) tree is built.
• One graph may have various DFS trees depending on how the depth-
first search is performed (i.e., the vertex visiting order).
• Given a DFS tree T , we call the starting vertex in T , v0, the root. The
last visited vertex, vn, is called the right-most vertex. The straight path
from v0 to vn is called the right-most path.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 21

• Gspan: Given a graph G and a DFS tree T in G, a new edge e can be added between the right-most vertex
and other vertices on the right-most path (backward extension);
• or it can introduce a new vertex and connect to vertices on the right-most path (forward extension).
• we call them right-most extension,

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 22

Materia Medica of Homoeopathic Medicines .00041 - 1contents PDF
No ratings yet
Materia Medica of Homoeopathic Medicines .00041 - 1contents PDF
14 pages
11 Graph Pattern Mining
No ratings yet
11 Graph Pattern Mining
71 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
unit 4
No ratings yet
unit 4
78 pages
Graph Pattern Mining, Search and OLAP
No ratings yet
Graph Pattern Mining, Search and OLAP
14 pages
CA10 GraphMining
No ratings yet
CA10 GraphMining
59 pages
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
No ratings yet
4 IJAEST Vol No.4 Issue No.2 Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks 014 017
4 pages
GraphMining-04-FrequentSubgraph
No ratings yet
GraphMining-04-FrequentSubgraph
61 pages
Mining Frequent Subgraph Patterns From Uncertain Graph Data
No ratings yet
Mining Frequent Subgraph Patterns From Uncertain Graph Data
16 pages
Co So Du Lieu Do Thi
No ratings yet
Co So Du Lieu Do Thi
46 pages
s44196-021-00001-4
No ratings yet
s44196-021-00001-4
17 pages
Grami-2014-Elseidy
No ratings yet
Grami-2014-Elseidy
12 pages
FSG (1)
No ratings yet
FSG (1)
13 pages
Finding Frequent Subpaths in A Graph
No ratings yet
Finding Frequent Subpaths in A Graph
12 pages
Community Detection Using Statistically Significant Subgraph Mining
No ratings yet
Community Detection Using Statistically Significant Subgraph Mining
10 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
module-5 graphs
No ratings yet
module-5 graphs
16 pages
Ch8 Graph
No ratings yet
Ch8 Graph
22 pages
Graph Mining Tools
No ratings yet
Graph Mining Tools
3 pages
Module 5 Graphs
No ratings yet
Module 5 Graphs
16 pages
DFS
No ratings yet
DFS
1 page
Graph in Datastructure
No ratings yet
Graph in Datastructure
34 pages
Graph based clustering
No ratings yet
Graph based clustering
78 pages
GraphBasedDataScience
No ratings yet
GraphBasedDataScience
37 pages
Managing and Mining Graph Data
No ratings yet
Managing and Mining Graph Data
620 pages
Graph Relational Data
No ratings yet
Graph Relational Data
1 page
SPMF - A Java Open-Source Data Mining Library
No ratings yet
SPMF - A Java Open-Source Data Mining Library
1 page
Graph Mining Algos in Object Tracking
No ratings yet
Graph Mining Algos in Object Tracking
2 pages
Lec 3
No ratings yet
Lec 3
21 pages
Data structures (Graph) (1)
No ratings yet
Data structures (Graph) (1)
51 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
Graph Mining Handout
No ratings yet
Graph Mining Handout
7 pages
Learning Graphs From Data A Signal Representation Perspective
No ratings yet
Learning Graphs From Data A Signal Representation Perspective
20 pages
lect7_Graph Algorithm
No ratings yet
lect7_Graph Algorithm
45 pages
Week 10 (Graphs and Trees)
No ratings yet
Week 10 (Graphs and Trees)
66 pages
DM Unit 2 Topics (1)
No ratings yet
DM Unit 2 Topics (1)
12 pages
Scalable Maximal Subgraph Mining With Backbone-Preserving Graph Convolutions
No ratings yet
Scalable Maximal Subgraph Mining With Backbone-Preserving Graph Convolutions
22 pages
Graphs and Algorithms
No ratings yet
Graphs and Algorithms
7 pages
Paper Graph Mining
No ratings yet
Paper Graph Mining
8 pages
Basic Graph
No ratings yet
Basic Graph
8 pages
Data Structure 25 + 26
No ratings yet
Data Structure 25 + 26
27 pages
Seminars BNF Lecture7 GraphII 2022
No ratings yet
Seminars BNF Lecture7 GraphII 2022
16 pages
Graphs
No ratings yet
Graphs
5 pages
[34] 2019 BDA TKG Top-k-subgraphs
No ratings yet
[34] 2019 BDA TKG Top-k-subgraphs
18 pages
Graphs in Data Structures
No ratings yet
Graphs in Data Structures
3 pages
SMIREP: Predicting Chemical Activity From SMILES: Andreas Karwath and Luc de Raedt
No ratings yet
SMIREP: Predicting Chemical Activity From SMILES: Andreas Karwath and Luc de Raedt
13 pages
Graph Algorithms (Crowdsourced)
No ratings yet
Graph Algorithms (Crowdsourced)
13 pages
SADMJ12
No ratings yet
SADMJ12
19 pages
1-s2.0-S0950705121002240-main-1
No ratings yet
1-s2.0-S0950705121002240-main-1
14 pages
10 Graph Algorithms Visually Explained
No ratings yet
10 Graph Algorithms Visually Explained
16 pages
Feature-Based Similarity Search in Graph
No ratings yet
Feature-Based Similarity Search in Graph
36 pages
Thesis Proposal: Graph Structured Statistical Inference: James Sharpnack
No ratings yet
Thesis Proposal: Graph Structured Statistical Inference: James Sharpnack
20 pages
daa
No ratings yet
daa
15 pages
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
Graph Theory and Its Applications
No ratings yet
Graph Theory and Its Applications
71 pages
Graphs in ds2 Bca 4
No ratings yet
Graphs in ds2 Bca 4
20 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Rubric For Resume and Cover Letter
100% (1)
Rubric For Resume and Cover Letter
8 pages
Entrepreneurship: Catalyst For Innovation and Economic Growth
No ratings yet
Entrepreneurship: Catalyst For Innovation and Economic Growth
13 pages
An Embedded Smart Affordable Helmet For Construction Workers To Reduce Fatality Rate Using Iot Device
No ratings yet
An Embedded Smart Affordable Helmet For Construction Workers To Reduce Fatality Rate Using Iot Device
10 pages
Social Studies Subject For Elementary School - World Hunger and Poverty by Slidesgo
No ratings yet
Social Studies Subject For Elementary School - World Hunger and Poverty by Slidesgo
58 pages
Farhana, 2018
No ratings yet
Farhana, 2018
14 pages
Wine - Water Interface Detection
No ratings yet
Wine - Water Interface Detection
2 pages
Amazon, 9-11by Ilona Yusuf
No ratings yet
Amazon, 9-11by Ilona Yusuf
5 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
(eBook PDF) Geodynamics 3rd Edition by Donald Turcotte All Chapters Instant Download
100% (8)
(eBook PDF) Geodynamics 3rd Edition by Donald Turcotte All Chapters Instant Download
45 pages
Journal of Statistical Software: Elastic Net Regularization Paths For All Generalized Linear Models
No ratings yet
Journal of Statistical Software: Elastic Net Regularization Paths For All Generalized Linear Models
31 pages
Magale2
No ratings yet
Magale2
8 pages
Zulrahmat, Herlina - Pengaruh Strategi Problem Based Learning...
No ratings yet
Zulrahmat, Herlina - Pengaruh Strategi Problem Based Learning...
16 pages
[Ebooks PDF] download (eBook PDF) Marketing Strategy and Competitive Positioning by Graham J. Hooley full chapters
100% (4)
[Ebooks PDF] download (eBook PDF) Marketing Strategy and Competitive Positioning by Graham J. Hooley full chapters
46 pages
DBD&D - Super Heroes
No ratings yet
DBD&D - Super Heroes
8 pages
Youtube Channel Business Plan
No ratings yet
Youtube Channel Business Plan
3 pages
WLP 6 Urdu
No ratings yet
WLP 6 Urdu
2 pages
Educ 3 - Module 2.2
No ratings yet
Educ 3 - Module 2.2
8 pages
DLL - Science 5 - Q2 - W9 - D3
No ratings yet
DLL - Science 5 - Q2 - W9 - D3
5 pages
Jurnal Putri
No ratings yet
Jurnal Putri
8 pages
2 Example From Mario Paz CANTIK
No ratings yet
2 Example From Mario Paz CANTIK
2 pages
FREE E-BOOK MCQ SERIES BASED ON E PG PATHSHALA P02-M1,2,3
No ratings yet
FREE E-BOOK MCQ SERIES BASED ON E PG PATHSHALA P02-M1,2,3
81 pages
Download ebooks file The flight of the heron 1st Edition D K Broster all chapters
100% (3)
Download ebooks file The flight of the heron 1st Edition D K Broster all chapters
62 pages
Final - SHS - Reading and Writing Skills - Q3 - Module 3 - Types of Claims Made in A Written Text
No ratings yet
Final - SHS - Reading and Writing Skills - Q3 - Module 3 - Types of Claims Made in A Written Text
9 pages
Level 3 Lesson 2 Simplified
No ratings yet
Level 3 Lesson 2 Simplified
9 pages
Ee 308
No ratings yet
Ee 308
19 pages
Intro
No ratings yet
Intro
2 pages
Lecture 1 Particle Technology (Introduction)
No ratings yet
Lecture 1 Particle Technology (Introduction)
20 pages
Eco-Education - Integrating Environmental Topics in Curriculum
No ratings yet
Eco-Education - Integrating Environmental Topics in Curriculum
5 pages
Đề Chuẩn Minh Hoạ Số 3
No ratings yet
Đề Chuẩn Minh Hoạ Số 3
18 pages

Seminars in bio lecture6 2022 Graphنينااااا

Uploaded by

Seminars in bio lecture6 2022 Graphنينااااا

Uploaded by

Lecture 6: Seminars in

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 1

• Can analyze the properties of a real world graph

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 3

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 4

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 6

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 7

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 8

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 9

• Most studies on frequent substructure discovery focus on the optimization of the

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 10

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 11

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 12

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 13

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 14

• Sk is the frequent substructure set of size k. 1. AprioriGraph adopts a level-wise

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 15

• Each of these methods explores various candidate generation strategies.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 16

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 17

these two patterns.

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 18

• Graphs are classified by the number of disjoint paths they have

Disadvantages of Apriori-based approaches:

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 19

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 20

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 21

31/12/2022 Prof. Taysir Hassan Soliman ; Seminars in Bioinformatics 22

You might also like