SlideShare a Scribd company logo
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Alicia Frame, PhD
Senior Data Scientist, Neo4j
Transforming AI with Graphs:
Real World Examples with Spark & Neo4j
#UnifiedDataAnalytics #SparkAISummit
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Financial Services Drug Discovery Recommendations
Cybersecurity Predictive Maintenance
Customer Segmentation
Churn Prediction Search/MDM
Graph Data Science Applications
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Latitude: 37.5629900°
Longitude: -122.3255300°
Nodes
• Can have Labels to classify nodes
• Labels have native indexes
Relationships
• Relate nodes by type and direction
Properties
• Attributes of Nodes & Relationships
• Stored as Name/Value pairs
• Can have indexes and composite indexes
MARRIED TO
LIVES WITH
OW
NS
PERSON PERSON
7
Labeled Property Graphs
• Current data science models ignore network structure
• Graphs add highly predictive features to existing ML models
• Otherwise unattainable predictions based on relationships
Novel & More Accurate Predictions
with the Data You Already Have
Machine Learning Pipeline
“The idea is that graph networks are bigger than
any one machine-learning approach.
Graphs bring an ability to generalize about
structure that the individual neural nets don't have.”
"Where do the
graphs come from
that
graph networks
operate over?”
Building a Graph ML Model
Data
Sources
Native Graph
Platform
Machine
Learning
Aggregate Disparate Data
and Cleanse
Build Predictive ModelsUnify Graphs and Engineer
Features
Parquet JSON
and more…
MLlib
and more…
Spark Graph Native Graph
Platform
Machine Learning
Example: Spark & Neo4j Workflow
Graph
Transactions
Graph
Analytics
Cypher 9 in Spark 3.0
to create non-
persistent graphs
MLlib to Train Models
Native Graph Algorithms,
Processing, and Storage
Explore Graphs
Build Graph
Solutions
• Massively scalable
• Powerful data pipelining
• Robust ML Libraries
• Non-persistent, non-native graphs
• Persistent, dynamic graphs
• Graph native query and algorithm
performance
• Constantly growing list of graph
algorithms and embeddings
The Steps of Graph Data Science
Query Based
Knowledge
Graph
Query Based
Feature
Engineering
Graph
Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
DataScienceComplexity
Knowledge
Graphs
Graph Feature
Engineering
Graph Native
Learning
Graph Persistence
Steps Forward in Graph Data Science
Query Based
Knowledge
Graph
Query Based
Feature
Engineering
Graph Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Enterprise Maturity
DataScienceComplexity
Query based knowledge graphs:
Connecting the Dots at NASA
“Using Neo4j someone from our Orion project found information from the Apollo
project that prevented an issue, saving well over two years of work and one
million dollars of taxpayer funds.”
Steps Forward in Graph Data Science
Query Based
Knowledge
Graph
Graph
Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Query Based
Feature
Engineering
Enterprise Maturity
DataScienceComplexity
Churn prediction research
has found that simple hand-
engineered features are highly
predictive
• How many calls/texts has
an account made?
• How many of their contacts
have churned?
Query-Based Feature Engineering
Telecom-churn prediction
Telecommunication
networks are easily
represented as graphs
Query-Based Feature Engineering
Telecom-churn prediction
23
Add connected features
based on graph queries to
tabular data
Khan et al, 2015
Spark Graph Native Graph
Platform
Machine Learning
• Merge distributed data
into DataFrames
• Reshape your tables
into graphs
• Explore cypher queries
• Move to Neo4j to build
expert queries
• Persist your graph
Knowledge Graphs:
Getting Started Example with Spark
• Bring query based
graph features to ML
pipeline
Graph
Transactions
Graph
Analytics
Steps Forward in Graph Data Science
Query Based
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Query Based
Knowledge
Graph
Graph
Algorithm
Feature
Engineering
Enterprise Maturity
DataScienceComplexity
Feature Engineering is how we combine and process the
data to create new, more meaningful features, such as
clustering or connectivity metrics.
Graph Feature Engineering
Add More Descriptive Features:
- Influence
- Relationships
- Communities
Graph Feature Categories & Algorithms
Pathfinding
& Search
Finds the optimal paths or
evaluates
route availability and quality
Centrality /
Importance
Determines the importance of
distinct nodes in the network
Community
Detection
Detects group clustering or
partition options
Heuristic
Link Prediction
Estimates the likelihood of nodes
forming a relationship
Evaluates how alike nodes
are
Similarity
Embeddings
Learned representations
of connectivity or topology
• Connected components to identify
disjointed graphs sharing identifiers
• PageRank to measure influence and
transaction volumes
• Louvain to identify communities
that frequently interact
• Jaccard to measure account
similarity based on relationships
Financial Crime: Detecting Fraud
Large financial institutions already have existing pipelines
to identify fraud via heuristics and models
Graph based features improve accuracy:
+142,000 Peer Reviewed Publications
Graph Fraud / Anomaly Detection
in the last 10 years
Spark Graph Native Graph
Platform
Machine Learning
• Merge distributed data
into DataFrames
• Reshape your tables
into graphs
• Explore cypher queries
and simple algorithms
• Persist your graph
• Create rule based
features
• Run native graph
algorithms and write to
graph or stream
Graph Feature Engineering:
Getting Started Example with Spark
• Bring graph features
to ML pipeline for
training
Graph
Transactions
Graph
Analytics
Graph Algorithms in Neo4J
• Parallel Breadth First Search
• Parallel Depth First Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Graph Algorithms in Neo4J
• Parallel Breadth First Search
• Parallel Depth First Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Steps Forward in Graph Data Science
Query Based
Knowledge
Graph
Graph
Algorithm
Feature
Engineering
Graph Neural
Networks
Query Based
Feature
Engineering
Graph
Embeddings
Enterprise Maturity
DataScienceComplexity
Embedding transforms graphs into a vector, or set of
vectors, describing topology, connectivity, or attributes of
nodes and edges in the graph
Graph Embeddings
• Vertex embeddings: describe connectivity of each node
• Path embeddings: traversals across the graph
• Graph embeddings: encode an entire graph into a single vector
Explainable Reasoning over Knowledge Graphs for
Recommendation
Graph Embeddings - Recommendations
Explainable Reasoning over Knowledge Graphs for
Recommendation
Graph Embeddings - Recommendations
Spark Graph Native Graph
Platform
Machine Learning
• Merge distributed data into
DataFrames
• Reshape your tables
into graphs
• Explore cypher queries and
simple algorithms
• Move to Neo4j to build
expert queries
• Write to persist
• Stay tuned for DeepWalk
and DeepGL algorithms
Graph Feature Engineering:
Getting Started Example with Spark
• Bring graph features to
ML pipeline for training
Graph
Transactions
Graph
Analytics
Steps Forward in Graph Data Science
Query Based
Knowledge
Graph
Graph
Algorithm
Feature
Engineering
Query Based
Feature
Engineering
Graph Neural
Networks
Graph
Embeddings
Enterprise Maturity
DataScienceComplexity
Deep Learning refers to training multi-layer neural
networks using gradient descent
Graph Native Learning
Graph Native Learning refers to deep learning models
that take a graph as an input, performs computations,
and return a graph
Graph Native Learning
Battaglia et al, 2018
Example: electron path prediction
Bradshaw et al, 2019
Graph Native Learning
Given reactants and reagents, what will the
products be?
Given reactants and reagents, what will the
products be?
Example: electron path prediction
Graph Native Learning
Progressing in Graph Data Science
Query Based
Knowledge
Graph
Query Based
Feature
Engineering
Graph
Algorithm
Feature
Engineering
Graph
Embeddings
Graph Neural
Networks
Enterprise Maturity
DataScienceComplexity
Knowledge
Graphs
Graph Feature
Engineering
Graph Native
Learning
Graph Persistence
Resources
Business
• neo4j.com/use-cases/artificial-intelligence-analytics/
Data Scientists/Developers
• neo4j.com/sandbox
• neo4j.com/developer/
• community.neo4j.com
alicia.frame@neo4j.com
@aliciaframe1
44#UnifiedAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
Ad

More Related Content

What's hot (20)

Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
Catherine Kimani
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
Neo4j
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
DataStax
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
Neo4j
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Optimizing Cypher Queries in Neo4j
Optimizing Cypher Queries in Neo4jOptimizing Cypher Queries in Neo4j
Optimizing Cypher Queries in Neo4j
Neo4j
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
Benjamin Bengfort
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Neo4j
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
Sunil Gurav
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
Neo4j
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
jexp
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
Neo4j
 
Introduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & BahrainIntroduction to Neo4j for the Emirates & Bahrain
Introduction to Neo4j for the Emirates & Bahrain
Neo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
DataStax
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
Neo4j
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Optimizing Cypher Queries in Neo4j
Optimizing Cypher Queries in Neo4jOptimizing Cypher Queries in Neo4j
Optimizing Cypher Queries in Neo4j
Neo4j
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Neo4j
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
Sunil Gurav
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
Neo4j
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
jexp
 

Similar to Transforming AI with Graphs: Real World Examples using Spark and Neo4j (20)

Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
Neo4j
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
Databricks
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
Neo4j
 
How Graphs Enhance AI
How Graphs Enhance AIHow Graphs Enhance AI
How Graphs Enhance AI
Neo4j
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
inside-BigData.com
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AI
Neo4j
 
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
Neo4j
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS Library
Neo4j
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
Neo4j
 
State of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAMState of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAM
Neo4j
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
Girish Khanzode
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache Spark
Henning Kropp
 
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Martin Junghanns
 
Odsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graphOdsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graph
venkatramanJ4
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
Max De Marzi
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Spark Summit
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
Neo4j
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
Databricks
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
Neo4j
 
How Graphs Enhance AI
How Graphs Enhance AIHow Graphs Enhance AI
How Graphs Enhance AI
Neo4j
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
inside-BigData.com
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AI
Neo4j
 
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
Neo4j
 
What Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS LibraryWhat Is GDS and Neo4j’s GDS Library
What Is GDS and Neo4j’s GDS Library
Neo4j
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
Neo4j
 
State of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAMState of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAM
Neo4j
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache Spark
Henning Kropp
 
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Martin Junghanns
 
Odsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graphOdsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graph
venkatramanJ4
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
Max De Marzi
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning3. Relationships Matter: Using Connected Data for Better Machine Learning
3. Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Spark Summit
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Ad

Recently uploaded (20)

04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 

Transforming AI with Graphs: Real World Examples using Spark and Neo4j

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Alicia Frame, PhD Senior Data Scientist, Neo4j Transforming AI with Graphs: Real World Examples with Spark & Neo4j #UnifiedDataAnalytics #SparkAISummit
  • 4. Financial Services Drug Discovery Recommendations Cybersecurity Predictive Maintenance Customer Segmentation Churn Prediction Search/MDM Graph Data Science Applications
  • 5. CAR DRIVES name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” Latitude: 37.5629900° Longitude: -122.3255300° Nodes • Can have Labels to classify nodes • Labels have native indexes Relationships • Relate nodes by type and direction Properties • Attributes of Nodes & Relationships • Stored as Name/Value pairs • Can have indexes and composite indexes MARRIED TO LIVES WITH OW NS PERSON PERSON 7 Labeled Property Graphs
  • 6. • Current data science models ignore network structure • Graphs add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships Novel & More Accurate Predictions with the Data You Already Have Machine Learning Pipeline
  • 7. “The idea is that graph networks are bigger than any one machine-learning approach. Graphs bring an ability to generalize about structure that the individual neural nets don't have.” "Where do the graphs come from that graph networks operate over?”
  • 8. Building a Graph ML Model Data Sources Native Graph Platform Machine Learning Aggregate Disparate Data and Cleanse Build Predictive ModelsUnify Graphs and Engineer Features Parquet JSON and more… MLlib and more…
  • 9. Spark Graph Native Graph Platform Machine Learning Example: Spark & Neo4j Workflow Graph Transactions Graph Analytics Cypher 9 in Spark 3.0 to create non- persistent graphs MLlib to Train Models Native Graph Algorithms, Processing, and Storage
  • 10. Explore Graphs Build Graph Solutions • Massively scalable • Powerful data pipelining • Robust ML Libraries • Non-persistent, non-native graphs • Persistent, dynamic graphs • Graph native query and algorithm performance • Constantly growing list of graph algorithms and embeddings
  • 11. The Steps of Graph Data Science Query Based Knowledge Graph Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks DataScienceComplexity Knowledge Graphs Graph Feature Engineering Graph Native Learning Graph Persistence
  • 12. Steps Forward in Graph Data Science Query Based Knowledge Graph Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Maturity DataScienceComplexity
  • 13. Query based knowledge graphs: Connecting the Dots at NASA “Using Neo4j someone from our Orion project found information from the Apollo project that prevented an issue, saving well over two years of work and one million dollars of taxpayer funds.”
  • 14. Steps Forward in Graph Data Science Query Based Knowledge Graph Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Query Based Feature Engineering Enterprise Maturity DataScienceComplexity
  • 15. Churn prediction research has found that simple hand- engineered features are highly predictive • How many calls/texts has an account made? • How many of their contacts have churned? Query-Based Feature Engineering Telecom-churn prediction Telecommunication networks are easily represented as graphs
  • 16. Query-Based Feature Engineering Telecom-churn prediction 23 Add connected features based on graph queries to tabular data Khan et al, 2015
  • 17. Spark Graph Native Graph Platform Machine Learning • Merge distributed data into DataFrames • Reshape your tables into graphs • Explore cypher queries • Move to Neo4j to build expert queries • Persist your graph Knowledge Graphs: Getting Started Example with Spark • Bring query based graph features to ML pipeline Graph Transactions Graph Analytics
  • 18. Steps Forward in Graph Data Science Query Based Feature Engineering Graph Embeddings Graph Neural Networks Query Based Knowledge Graph Graph Algorithm Feature Engineering Enterprise Maturity DataScienceComplexity
  • 19. Feature Engineering is how we combine and process the data to create new, more meaningful features, such as clustering or connectivity metrics. Graph Feature Engineering Add More Descriptive Features: - Influence - Relationships - Communities
  • 20. Graph Feature Categories & Algorithms Pathfinding & Search Finds the optimal paths or evaluates route availability and quality Centrality / Importance Determines the importance of distinct nodes in the network Community Detection Detects group clustering or partition options Heuristic Link Prediction Estimates the likelihood of nodes forming a relationship Evaluates how alike nodes are Similarity Embeddings Learned representations of connectivity or topology
  • 21. • Connected components to identify disjointed graphs sharing identifiers • PageRank to measure influence and transaction volumes • Louvain to identify communities that frequently interact • Jaccard to measure account similarity based on relationships Financial Crime: Detecting Fraud Large financial institutions already have existing pipelines to identify fraud via heuristics and models Graph based features improve accuracy:
  • 22. +142,000 Peer Reviewed Publications Graph Fraud / Anomaly Detection in the last 10 years
  • 23. Spark Graph Native Graph Platform Machine Learning • Merge distributed data into DataFrames • Reshape your tables into graphs • Explore cypher queries and simple algorithms • Persist your graph • Create rule based features • Run native graph algorithms and write to graph or stream Graph Feature Engineering: Getting Started Example with Spark • Bring graph features to ML pipeline for training Graph Transactions Graph Analytics
  • 24. Graph Algorithms in Neo4J • Parallel Breadth First Search • Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors
  • 25. Graph Algorithms in Neo4J • Parallel Breadth First Search • Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors
  • 26. Steps Forward in Graph Data Science Query Based Knowledge Graph Graph Algorithm Feature Engineering Graph Neural Networks Query Based Feature Engineering Graph Embeddings Enterprise Maturity DataScienceComplexity
  • 27. Embedding transforms graphs into a vector, or set of vectors, describing topology, connectivity, or attributes of nodes and edges in the graph Graph Embeddings • Vertex embeddings: describe connectivity of each node • Path embeddings: traversals across the graph • Graph embeddings: encode an entire graph into a single vector
  • 28. Explainable Reasoning over Knowledge Graphs for Recommendation Graph Embeddings - Recommendations
  • 29. Explainable Reasoning over Knowledge Graphs for Recommendation Graph Embeddings - Recommendations
  • 30. Spark Graph Native Graph Platform Machine Learning • Merge distributed data into DataFrames • Reshape your tables into graphs • Explore cypher queries and simple algorithms • Move to Neo4j to build expert queries • Write to persist • Stay tuned for DeepWalk and DeepGL algorithms Graph Feature Engineering: Getting Started Example with Spark • Bring graph features to ML pipeline for training Graph Transactions Graph Analytics
  • 31. Steps Forward in Graph Data Science Query Based Knowledge Graph Graph Algorithm Feature Engineering Query Based Feature Engineering Graph Neural Networks Graph Embeddings Enterprise Maturity DataScienceComplexity
  • 32. Deep Learning refers to training multi-layer neural networks using gradient descent Graph Native Learning
  • 33. Graph Native Learning refers to deep learning models that take a graph as an input, performs computations, and return a graph Graph Native Learning Battaglia et al, 2018
  • 34. Example: electron path prediction Bradshaw et al, 2019 Graph Native Learning Given reactants and reagents, what will the products be? Given reactants and reagents, what will the products be?
  • 35. Example: electron path prediction Graph Native Learning
  • 36. Progressing in Graph Data Science Query Based Knowledge Graph Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Maturity DataScienceComplexity Knowledge Graphs Graph Feature Engineering Graph Native Learning Graph Persistence
  • 37. Resources Business • neo4j.com/use-cases/artificial-intelligence-analytics/ Data Scientists/Developers • neo4j.com/sandbox • neo4j.com/developer/ • community.neo4j.com [email protected] @aliciaframe1 44#UnifiedAnalytics #SparkAISummit
  • 38. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT