SlideShare a Scribd company logo
Graph Based Machine
Learning on Relational Data
Problems and Methods
Machine Learning using Graphs
Machine Learning using Graphs
- Machine Learning is iterative but iteration
can also be seen as traversal.
Machine Learning using Graphs
- Machine Learning is iterative but iteration
can also be seen as traversal.
- Many domains have structures already
modeled as graphs (health records, finance)
Machine Learning using Graphs
- Machine Learning is iterative but iteration
can also be seen as traversal.
- Many domains have structures already
modeled as graphs (health records, finance)
- Important analyses are graph algorithms:
clusters, influence propagation, centrality.
Machine Learning using Graphs
- Machine Learning is iterative but iteration
can also be seen as traversal.
- Many domains have structures already
modeled as graphs (health records, finance)
- Important analyses are graph algorithms:
clusters, influence propagation, centrality.
- Performance benefits on sparse data
Machine Learning using Graphs
- Machine Learning is iterative but iteration
can also be seen as traversal.
- Many domains have structures already
modeled as graphs (health records, finance)
- Important analyses are graph algorithms:
clusters, influence propagation, centrality.
- Performance benefits on sparse data
- More understandable implementation
Iterative PageRank in Python
def pageRank(G, s = .85, maxerr = .001):
n = G.shape[0]
# transform G into markov matrix M
M = csc_matrix(G,dtype=np.float)
rsums = np.array(M.sum(1))[:,0]
ri, ci = M.nonzero()
M.data /= rsums[ri]
sink = rsums==0 # bool array of sink states
# Compute pagerank r until we converge
ro, r = np.zeros(n), np.ones(n)
while np.sum(np.abs(r-ro)) > maxerr:
ro = r.copy()
for i in xrange(0,n):
Ii = np.array(M[:,i].todense())[:,0] # inlinks of state i
Si = sink / float(n) # account for sink states
Ti = np.ones(n) / float(n) # account for teleportation
r[i] = ro.dot( Ii*s + Si*s + Ti*(1-s) )
return r/sum(r) # return normalized pagerank
Graph-Based PageRank in Gremlin
pagerank = [:].withDefault{0}
size = uris.size();
uris.each{
count = it.outE.count();
if(count == 0 || rand.nextDouble() > 0.85) {
rank = pagerank[it]
uris.each {
pagerank[it] = pagerank[it] / uris.size()
}
}
rank = pagerank[it] / it.outE.count();
it.out.each{
pagerank[it] = pagerank[it] + rank;
}
}
Learning by Example
- Machine Learning requires many instances with
which to fit a model to make predictions.
Learning by Example
- Machine Learning requires many instances with
which to fit a model to make predictions.
- Current large scale analytical methods (Pregel,
Giraph, GraphLab) are in-memory without data
storage components.
Learning by Example
- Machine Learning requires many instances with
which to fit a model to make predictions.
- Current large scale analytical methods (Pregel,
Giraph, GraphLab) are in-memory with data
storage components
- And while Neo4j, OrientDB, and Titan are ok...
Learning by Example
- Machine Learning requires many instances with
which to fit a model to make predictions.
- Current large scale analytical methods (Pregel,
Giraph, GraphLab) are in-memory with data
storage components
- And while Neo4j, OrientDB, and Titan are ok...
- Most (active) data sits in relational databases
where users interact with it in real time via
transactions in web applications.
Is it because relational data is a legacy system we must support?
Is it purely because of inertia?
NO! It’s because Relational Data is awesome!
Awesome sauce relational data of the future.
- Ability to express queries/algorithms using a
declarative, graph-domain specific language
like SQL, or at the very least via UDFs.
Requirements
Requirements
- Ability to express queries/algorithms using a
declarative, graph-domain specific language
like SQL, or at the very least via UDFs.
- Ability to explore and identify hidden or
implicit graphs in the database.
Requirements
- Ability to express queries/algorithms using a
declarative, graph-domain specific language
like SQL, or at the very least via UDFs.
- Ability to explore and identify hidden or
implicit graphs in the database.
- Combine in-memory analytics with some
disk storage facility that is transactional.
Approach 1: ETL Methods
t = 0 t > 0
extract
transform
load
synchronize
analyze
Approach 1: ETL Methods
The Good
- Processing is not physical layer dependent
- Relational data storage with real time interaction
- Analytics can scale in size to Hadoop or in speed to in-
memory computation frameworks.
The Bad
- Must know structure of graph in relational database
ahead of time, no exploration.
- Synchronization can cause inconsistency.
- OLAP processes incur resource penalty (I/O or CPU
depending on location).
Approach 1: ETL Methods
The Good
- Processing is not physical layer dependent
- Relational data storage with real time interaction
- Analytics can scale in size to Hadoop or in speed to in-
memory computation frameworks.
The Bad
- Must know structure of graph in relational database
ahead of time, no exploration.
- Synchronization can cause inconsistency.
- OLAP processes incur resource penalty (I/O or CPU
depending on location).
Approach 2: Store Graph in RDBMS
Approach 2: Store Graph in RDBMS
The Good
- Can utilize relational devices like indices and parallel
joins for graph-specific queries on existing data.
- Simply use SQL for the data access mechanism.
- Transactional storage of the data.
The Bad
- Constrained to graph-specific schema.
- Many joins required for traversal.
- Depending on storage mechanisms there may be too
few or too many tables in the database for applications.
- Must convert existing database to this structure.
Approach 2: Store Graph in RDBMS
The Good
- Can utilize relational devices like indices and parallel
joins for graph-specific queries on existing data.
- Simply use SQL for the data access mechanism.
- Transactional storage of the data.
The Bad
- Constrained to graph-specific schema.
- Many joins required for traversal.
- Depending on storage mechanisms there may be too
few or too many tables in the database for applications.
- Must convert existing database to this structure.
Approach 3: Use Graph Query Language
API
Optimizer
Query Result
Query Translator
SQL Queries
Final SQL
Queries
Graph DSL Query
Approach 3: Use Graph Query Language
The Good
- DSL in the graph domain that easily expresses graph
analytics but also relational semantics.
- Can use existing relational schemas; allows for
exploration and identification of graphs.
- Computation is offloaded into in-memory processing
The Bad
- Many graphs or big graphs can cause too many joins
without optimal query translation.
- User is required to facilitate definition of relational
structure into a graph representation.
- May not leverage relational resources.
Approach 3: Use Graph Query Language
The Good
- DSL in the graph domain that easily expresses graph
analytics but also relational semantics.
- Can use existing relational schemas; allows for
exploration and identification of graphs.
- Computation is offloaded into in-memory processing
The Bad
- Many graphs or big graphs can cause too many joins
without optimal query translation.
- User is required to facilitate definition of relational
structure into a graph representation.
- May not leverage relational resources.
Any Questions?
Thank you!
Presented By:
Konstantinos Xirogiannopoulos <kostasx@cs.umd.edu>
Benjamin Bengfort <bengfort@cs.umd.edu>
May 7, 2015
Ad

More Related Content

What's hot (20)

Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
Varad Meru
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
datamantra
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
Korea Sdec
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
DB Tsai
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deck
Eric Dill
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
DataWorks Summit
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
Shivaji Dutta
 
Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopRecommendation Engine Powered by Hadoop
Recommendation Engine Powered by Hadoop
Pranab Ghosh
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
Sarah Guido
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic search
CareerBuilder.com
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
Large Scale Machine learning with Spark
Large Scale Machine learning with SparkLarge Scale Machine learning with Spark
Large Scale Machine learning with Spark
Md. Mahedi Kaysar
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
Spark Summit
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
Download It
Download ItDownload It
Download It
butest
 
Mahout
MahoutMahout
Mahout
Edureka!
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
Varad Meru
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
datamantra
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
Korea Sdec
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
DB Tsai
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Ferruzza g automl deck
Ferruzza g   automl deckFerruzza g   automl deck
Ferruzza g automl deck
Eric Dill
 
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
DataWorks Summit
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
Shivaji Dutta
 
Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopRecommendation Engine Powered by Hadoop
Recommendation Engine Powered by Hadoop
Pranab Ghosh
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
Sarah Guido
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic search
CareerBuilder.com
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
Large Scale Machine learning with Spark
Large Scale Machine learning with SparkLarge Scale Machine learning with Spark
Large Scale Machine learning with Spark
Md. Mahedi Kaysar
 
Hundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario SpacagnaHundreds of queries in the time of one - Gianmario Spacagna
Hundreds of queries in the time of one - Gianmario Spacagna
Spark Summit
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
Download It
Download ItDownload It
Download It
butest
 

Similar to Graph Based Machine Learning on Relational Data (20)

L15 Data Source Layer
L15 Data Source LayerL15 Data Source Layer
L15 Data Source Layer
Ólafur Andri Ragnarsson
 
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningA Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
Makoto Yui
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
Ahmed Kamal
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
OLAP (Online Analytical Processing).pptx
OLAP (Online Analytical Processing).pptxOLAP (Online Analytical Processing).pptx
OLAP (Online Analytical Processing).pptx
lalitajites
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Rio Info
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vj
homeworkping4
 
Lipstick On Pig
Lipstick On Pig Lipstick On Pig
Lipstick On Pig
bigdatagurus_meetup
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
 
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
Hakka Labs
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
Varad Meru
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
MohammedShahid562503
 
L17 Data Source Layer
L17 Data Source LayerL17 Data Source Layer
L17 Data Source Layer
Ólafur Andri Ragnarsson
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Jaipaul Agonus
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
DB Tsai
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
Mahantesh Angadi
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
Sri Kanajan
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
Ahmet Bulut
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
MLconf
 
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningA Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
Makoto Yui
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
Ahmed Kamal
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
OLAP (Online Analytical Processing).pptx
OLAP (Online Analytical Processing).pptxOLAP (Online Analytical Processing).pptx
OLAP (Online Analytical Processing).pptx
lalitajites
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Rio Info
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vj
homeworkping4
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
 
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
Hakka Labs
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
Varad Meru
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Jaipaul Agonus
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
DB Tsai
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
Mahantesh Angadi
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
Sri Kanajan
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
Ahmet Bulut
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
MLconf
 
Ad

More from Benjamin Bengfort (20)

Privacy and Security in the Age of Generative AI - C4AI.pdf
Privacy and Security in the Age of Generative AI - C4AI.pdfPrivacy and Security in the Age of Generative AI - C4AI.pdf
Privacy and Security in the Age of Generative AI - C4AI.pdf
Benjamin Bengfort
 
Implementing Function Calling LLMs without Fear.pdf
Implementing Function Calling LLMs without Fear.pdfImplementing Function Calling LLMs without Fear.pdf
Implementing Function Calling LLMs without Fear.pdf
Benjamin Bengfort
 
Privacy and Security in the Age of Generative AI
Privacy and Security in the Age of Generative AIPrivacy and Security in the Age of Generative AI
Privacy and Security in the Age of Generative AI
Benjamin Bengfort
 
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Digitocracy without Borders: the unifying and destabilizing effects of softwa...Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Benjamin Bengfort
 
Getting Started with TRISA
Getting Started with TRISAGetting Started with TRISA
Getting Started with TRISA
Benjamin Bengfort
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)
Benjamin Bengfort
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
Benjamin Bengfort
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
Benjamin Bengfort
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Benjamin Bengfort
 
An Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation ReportAn Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation Report
Benjamin Bengfort
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
Benjamin Bengfort
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
Annotation with Redfox
Annotation with RedfoxAnnotation with Redfox
Annotation with Redfox
Benjamin Bengfort
 
Rasta processing of speech
Rasta processing of speechRasta processing of speech
Rasta processing of speech
Benjamin Bengfort
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
Benjamin Bengfort
 
Privacy and Security in the Age of Generative AI - C4AI.pdf
Privacy and Security in the Age of Generative AI - C4AI.pdfPrivacy and Security in the Age of Generative AI - C4AI.pdf
Privacy and Security in the Age of Generative AI - C4AI.pdf
Benjamin Bengfort
 
Implementing Function Calling LLMs without Fear.pdf
Implementing Function Calling LLMs without Fear.pdfImplementing Function Calling LLMs without Fear.pdf
Implementing Function Calling LLMs without Fear.pdf
Benjamin Bengfort
 
Privacy and Security in the Age of Generative AI
Privacy and Security in the Age of Generative AIPrivacy and Security in the Age of Generative AI
Privacy and Security in the Age of Generative AI
Benjamin Bengfort
 
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Digitocracy without Borders: the unifying and destabilizing effects of softwa...Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Benjamin Bengfort
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)
Benjamin Bengfort
 
Visualizing the Model Selection Process
Visualizing the Model Selection ProcessVisualizing the Model Selection Process
Visualizing the Model Selection Process
Benjamin Bengfort
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Benjamin Bengfort
 
An Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation ReportAn Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation Report
Benjamin Bengfort
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
Benjamin Bengfort
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
Benjamin Bengfort
 
Ad

Recently uploaded (20)

How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 

Graph Based Machine Learning on Relational Data

  • 1. Graph Based Machine Learning on Relational Data Problems and Methods
  • 3. Machine Learning using Graphs - Machine Learning is iterative but iteration can also be seen as traversal.
  • 4. Machine Learning using Graphs - Machine Learning is iterative but iteration can also be seen as traversal. - Many domains have structures already modeled as graphs (health records, finance)
  • 5. Machine Learning using Graphs - Machine Learning is iterative but iteration can also be seen as traversal. - Many domains have structures already modeled as graphs (health records, finance) - Important analyses are graph algorithms: clusters, influence propagation, centrality.
  • 6. Machine Learning using Graphs - Machine Learning is iterative but iteration can also be seen as traversal. - Many domains have structures already modeled as graphs (health records, finance) - Important analyses are graph algorithms: clusters, influence propagation, centrality. - Performance benefits on sparse data
  • 7. Machine Learning using Graphs - Machine Learning is iterative but iteration can also be seen as traversal. - Many domains have structures already modeled as graphs (health records, finance) - Important analyses are graph algorithms: clusters, influence propagation, centrality. - Performance benefits on sparse data - More understandable implementation
  • 8. Iterative PageRank in Python def pageRank(G, s = .85, maxerr = .001): n = G.shape[0] # transform G into markov matrix M M = csc_matrix(G,dtype=np.float) rsums = np.array(M.sum(1))[:,0] ri, ci = M.nonzero() M.data /= rsums[ri] sink = rsums==0 # bool array of sink states # Compute pagerank r until we converge ro, r = np.zeros(n), np.ones(n) while np.sum(np.abs(r-ro)) > maxerr: ro = r.copy() for i in xrange(0,n): Ii = np.array(M[:,i].todense())[:,0] # inlinks of state i Si = sink / float(n) # account for sink states Ti = np.ones(n) / float(n) # account for teleportation r[i] = ro.dot( Ii*s + Si*s + Ti*(1-s) ) return r/sum(r) # return normalized pagerank
  • 9. Graph-Based PageRank in Gremlin pagerank = [:].withDefault{0} size = uris.size(); uris.each{ count = it.outE.count(); if(count == 0 || rand.nextDouble() > 0.85) { rank = pagerank[it] uris.each { pagerank[it] = pagerank[it] / uris.size() } } rank = pagerank[it] / it.outE.count(); it.out.each{ pagerank[it] = pagerank[it] + rank; } }
  • 10. Learning by Example - Machine Learning requires many instances with which to fit a model to make predictions.
  • 11. Learning by Example - Machine Learning requires many instances with which to fit a model to make predictions. - Current large scale analytical methods (Pregel, Giraph, GraphLab) are in-memory without data storage components.
  • 12. Learning by Example - Machine Learning requires many instances with which to fit a model to make predictions. - Current large scale analytical methods (Pregel, Giraph, GraphLab) are in-memory with data storage components - And while Neo4j, OrientDB, and Titan are ok...
  • 13. Learning by Example - Machine Learning requires many instances with which to fit a model to make predictions. - Current large scale analytical methods (Pregel, Giraph, GraphLab) are in-memory with data storage components - And while Neo4j, OrientDB, and Titan are ok... - Most (active) data sits in relational databases where users interact with it in real time via transactions in web applications.
  • 14. Is it because relational data is a legacy system we must support? Is it purely because of inertia?
  • 15. NO! It’s because Relational Data is awesome! Awesome sauce relational data of the future.
  • 16. - Ability to express queries/algorithms using a declarative, graph-domain specific language like SQL, or at the very least via UDFs. Requirements
  • 17. Requirements - Ability to express queries/algorithms using a declarative, graph-domain specific language like SQL, or at the very least via UDFs. - Ability to explore and identify hidden or implicit graphs in the database.
  • 18. Requirements - Ability to express queries/algorithms using a declarative, graph-domain specific language like SQL, or at the very least via UDFs. - Ability to explore and identify hidden or implicit graphs in the database. - Combine in-memory analytics with some disk storage facility that is transactional.
  • 19. Approach 1: ETL Methods t = 0 t > 0 extract transform load synchronize analyze
  • 20. Approach 1: ETL Methods The Good - Processing is not physical layer dependent - Relational data storage with real time interaction - Analytics can scale in size to Hadoop or in speed to in- memory computation frameworks. The Bad - Must know structure of graph in relational database ahead of time, no exploration. - Synchronization can cause inconsistency. - OLAP processes incur resource penalty (I/O or CPU depending on location).
  • 21. Approach 1: ETL Methods The Good - Processing is not physical layer dependent - Relational data storage with real time interaction - Analytics can scale in size to Hadoop or in speed to in- memory computation frameworks. The Bad - Must know structure of graph in relational database ahead of time, no exploration. - Synchronization can cause inconsistency. - OLAP processes incur resource penalty (I/O or CPU depending on location).
  • 22. Approach 2: Store Graph in RDBMS
  • 23. Approach 2: Store Graph in RDBMS The Good - Can utilize relational devices like indices and parallel joins for graph-specific queries on existing data. - Simply use SQL for the data access mechanism. - Transactional storage of the data. The Bad - Constrained to graph-specific schema. - Many joins required for traversal. - Depending on storage mechanisms there may be too few or too many tables in the database for applications. - Must convert existing database to this structure.
  • 24. Approach 2: Store Graph in RDBMS The Good - Can utilize relational devices like indices and parallel joins for graph-specific queries on existing data. - Simply use SQL for the data access mechanism. - Transactional storage of the data. The Bad - Constrained to graph-specific schema. - Many joins required for traversal. - Depending on storage mechanisms there may be too few or too many tables in the database for applications. - Must convert existing database to this structure.
  • 25. Approach 3: Use Graph Query Language API Optimizer Query Result Query Translator SQL Queries Final SQL Queries Graph DSL Query
  • 26. Approach 3: Use Graph Query Language The Good - DSL in the graph domain that easily expresses graph analytics but also relational semantics. - Can use existing relational schemas; allows for exploration and identification of graphs. - Computation is offloaded into in-memory processing The Bad - Many graphs or big graphs can cause too many joins without optimal query translation. - User is required to facilitate definition of relational structure into a graph representation. - May not leverage relational resources.
  • 27. Approach 3: Use Graph Query Language The Good - DSL in the graph domain that easily expresses graph analytics but also relational semantics. - Can use existing relational schemas; allows for exploration and identification of graphs. - Computation is offloaded into in-memory processing The Bad - Many graphs or big graphs can cause too many joins without optimal query translation. - User is required to facilitate definition of relational structure into a graph representation. - May not leverage relational resources.
  • 29. Thank you! Presented By: Konstantinos Xirogiannopoulos <[email protected]> Benjamin Bengfort <[email protected]> May 7, 2015

Editor's Notes

  • #2: Hi, my name is Kostas and this is Ben. Today we’re going to present the research challenges and existing methods of using graph analyses on relational data stores. [SLIDE CHANGE] Today I’d like to talk a little bit about conducting Graph Based Machine Learning on Relational Data
  • #3: As we’ve read and discussed in class - graphs are a valuable data structure, well suited for a range of non-trivial analyses and machine learning tasks. [SLIDE CHANGE] So as we’ve recently read about and talked about in class, graphs are an interesting data structure that is actually well suited not only for trivial graph analyses as well as complex machine learning tasks.
  • #4: To motivate the usage of graphs and graph oriented frameworks even further
  • #6: Analyses like finding clusters, influence propagation by means of Pagerank for example, or centrality, are in essence graph algorithms.
  • #8: But more importantly, graph, and graph specific languages and frameworks provide a substantially more comprehensible way of implementing algorithms
  • #9: https://gist.github.com/diogojc/1338222 G = np.array([[0,0,1,0,0,0,0], [0,1,1,0,0,0,0], [1,0,1,1,0,0,0], [0,0,0,1,1,0,0], [0,0,0,0,0,0,1], [0,0,0,0,0,1,1], [0,0,0,1,1,0,1]])
  • #10: https://groups.google.com/forum/#!topic/gremlin-users/LAm4mzzg8NY Expressing graph algorithms through this framework is a lot more intuitive!
  • #11: So hopefully I’ve convinced you about why we’d want to use Graphs for these types of analyses. Now we also know that Machine learning...
  • #14: Most actuve data actually sits inside relational databases , where users interact with it in real time via transactions in the web applications that we use every day
  • #15: Now why is it that we don’t move on? A rusting old jalopy in Hackberry, a small Arizona town just outside the middle of nowhere. https://flic.kr/p/dqG9Ad
  • #16: 1996 McLaren F1 GTR https://flic.kr/p/oc8gUh Awesome because : Strong semantics (durability, fault tolerance, integrity constraints) They support truly ACID Transactions They provide assurance because mature