SlideShare a Scribd company logo
Data Engineering
for RAG
ABK of Neo4j
(Andreas Kollegger)
Generative AI
Neo4j Inc. All rights reserved 2024
2
Generative AI
● Learns random sentences
from random people
● Talks like a person but doesn't really
understand what it's saying
● Occasionally speaks absolute nonsense
● Sensitive to question phrasing
● Answers reflect the person asking
● Can't explain or verify answers
● Limited to public "knowledge"
Neo4j Inc. All rights reserved 2024
3
Generative AI
● Learns random sentences
from random people
● Talks like a person but doesn't really
understand what it's saying
● Occasionally speaks absolute nonsense
● Sensitive to question phrasing
● Answers reflect the person asking
● Can't explain or verify answers
● Limited to public "knowledge"
Neo4j Inc. All rights reserved 2024
4
How do we
integrate
with the alien
technology?
Neo4j Inc. All rights reserved 2024
5
Everything
starts with
practical work,
using RAG…
Neo4j Inc. All rights reserved 2024
6
Retrieval Augmented
Generation (RAG)
RAG is a software design pattern for
integrating GenAI Apps with custom data
sources, like a database.
Neo4j Inc. All rights reserved 2024
7
A Generative AI application
uses an LLM
to provide responses
to user prompts
(aka ChatGPT)
Neo4j Inc. All rights reserved 2024
8
8
User
Prompt
Complete
Response
GenAI
Application
LLM
User Prompt
Response
RAG augments the LLM by
intercepting a user's prompt,
then making a query to a database,
then using the query results as
context for the user's prompt,
creating a new prompt that is passed
to the LLM
for a complete, curated response
Neo4j Inc. All rights reserved 2024
9
9
Database
GenAI
Application
Complete
Response
User
Prompt
LLM
User Prompt
+ Context
Response
User Prompt
Context
1 2
1 2
This sets up a knowledge stack…
the user knows something about the
question they're asking
the application knows something
about the user
the database knows about particular
information and data
the LLM knows about whatever it
found on the internet
Neo4j Inc. All rights reserved 2024
10
10
User Knowledge
App Knowledge
Database Knowledge
LLM Knowledge
Knowledge Stack
This sets up a knowledge stack…
the user knows something about the
question they're asking
the application knows something
about the user
the database knows about particular
information and data
the LLM knows about whatever it
found on the internet
Neo4j Inc. All rights reserved 2024
11
11
User Knowledge
App Knowledge
Database Knowledge
LLM Knowledge
Knowledge Stack
Knowledge you control,
in the app and the database.
Three Sources of Data
for RAG
Each with different access patterns,
supporting different kinds of questions.
Neo4j Inc. All rights reserved 2023
12
Neo4j Inc. All rights reserved 2024
13
Pure Text
Neo4j Inc. All rights reserved 2024
14
Pure Text
Unstructured data in PDFs,
plain text files, or images
Information search: “What is Apple's primary business?”
Answer with: Implicit knowledge derived from text.
Neo4j Inc. All rights reserved 2024
15
Pure Text
Unstructured data in PDFs,
plain text files, or images
Neo4j Inc. All rights reserved 2024
16
Pure Data
Pure Text
Neo4j Inc. All rights reserved 2024
17
Pure Data
Structured data
in a database
Pure Text
Neo4j Inc. All rights reserved 2024
18
Pure Data
Structured data
in a database
Pure Text
Information query: “How many iPhones did Apple sell this quarter?”
Answer with: Explicit facts from a database query.
Neo4j Inc. All rights reserved 2024
19
Pure Text Pure Data
Mixed
Text + Data
Neo4j Inc. All rights reserved 2024
20
Pure Text Pure Data
Mixed
Text + Data
Structured data together
with long-form text
Neo4j Inc. All rights reserved 2024
21
Pure Text Pure Data
Mixed
Text + Data
Structured data together
with long-form text
Information discovery: “Which investors will be impacted by a chip shortage?”
Answer with: Combined search and data query.
Neo4j Inc. All rights reserved 2024
22
Pure Text Pure Data
Mixed
Text + Data
Neo4j Inc. All rights reserved 2024
23
Pure Text Pure Data
Mixed
Text + Data
A Knowledge Graph:
Information architecture for data, organized using graph structures,
which places data within context.
Neo4j Inc. All rights reserved 2024
24
Pure Text Pure Data
Mixed
Text + Data
A Knowledge Graph:
Information architecture for data, organized using graph structures,
which places data within context.
Graph RAG:
Supports multiple modes of information retrieval, including
information search, information query, and information discovery.
Neo4j Inc. All rights reserved 2024
25
Pure Text Pure Data
Mixed
Text + Data
Vector Search Search + Pattern Matching Graph Queries
Find relevant documents
plus context for
information search
Expand context
and rank the relevance for
information discovery
Directly query the
knowledge graph for
information query
GenAI Example:
SEC Edgar
Financial Forms
Neo4j Inc. All rights reserved 2024
26
SEC Edgar Financial Data
The EDGAR database provides free public
access to company information, allowing
research about public company financial
information and operations through the filings
they submit to the SEC.
There are two forms that we'll look at today:
1. Form 10K-filings from publicly traded
companies
2. Form 13 -filings from institutional
investment management firms
Neo4j Inc. All rights reserved 2024
27
Data Modeling Strategy
Start with a Minimum Viable Graph (MVG)
Create, Enhance, Connect then repeat to grow the graph
1. Create-identify interesting information, create records
2. Enhance-supercharge the data by enhancing some dimension
3. Connect-connect information to expand context and reveal knowledge
Neo4j Inc. All rights reserved 2024
28
Form
10k
Chunk
Chunk
Chunk
Chunk
Create -Form 10K text chunks
exercitation ullamco
laboris nisi ut aliquip
enim ad minim veniam,
quis nostrud
incididunt ut labore et
dolore magna aliqua. Ut
adipiscing elit, sed do
eiusmod tempor
Lorem ipsum dolor sit
amet, consectetur
Chunk
Neo4j Inc. All rights reserved 2024
29
2. Split Text
1. Source - Form 10K 3. Create Nodes
Form
10k
Chunk
Chunk
Chunk
Chunk
Enhance -Text with an embedding
exercitation ullamco
laboris nisi ut aliquip
enim ad minim veniam,
quis nostrud
incididunt ut labore et
dolore magna aliqua. Ut
adipiscing elit, sed do
eiusmod tempor
Lorem ipsum dolor sit
amet, consectetur
[0.6,0.2,0.1,0.7]
[0.5,0.2,0.1,0.7]
[0.4,0.2,0.1,0.7]
[0.3,0.2,0.1,0.5]
[0.2,0.2,0.1,0.7]
1. Source - Chunks
Chunk
Neo4j Inc. All rights reserved 2024
Vector Index
30
4. Add Embedding
Form
10k
Chunk
Chunk
Chunk
Chunk
Connect -Connect chunks into a list
exercitation ullamco
laboris nisi ut aliquip
enim ad minim veniam,
quis nostrud
incididunt ut labore et
dolore magna aliqua. Ut
adipiscing elit, sed do
eiusmod tempor
Lorem ipsum dolor sit
amet, consectetur
[0.6,0.2,0.1,0.7]
[0.5,0.2,0.1,0.7]
[0.4,0.2,0.1,0.7]
[0.3,0.2,0.1,0.5]
[0.2,0.2,0.1,0.7] Chunk
NEXT
1. Connect Chunks
Chunk
Chunk
Neo4j Inc. All rights reserved 2024
31
Form
10k
Chunk
Chunk
Chunk
Chunk
Create, Enhance, Connect Form 10K
exercitation ullamco
laboris nisi ut aliquip
enim ad minim veniam,
quis nostrud
incididunt ut labore et
dolore magna aliqua. Ut
adipiscing elit, sed do
eiusmod tempor
Lorem ipsum dolor sit
amet, consectetur
2. Split Text
[0.6,0.2,0.1,0.7]
[0.5,0.2,0.1,0.7]
[0.4,0.2,0.1,0.7]
[0.3,0.2,0.1,0.5]
[0.2,0.2,0.1,0.7]
4. Add Embedding
1. Source - Form 10K 3. Create Nodes
Chunk
NEXT
5. Connect
Chunk
Chunk
Extract Enhance Expand
Neo4j Inc. All rights reserved 2024
32
Benefits:
● vector similarity search to find
relevant text
● expand context window with
previous/next chunks
● enable paging through text
Neo4j Inc. All rights reserved 2024
33
Chunk
NEXT
Minimum Viable Graph
formId: string
chunkId: string
text: string
textEmbedding: float[]
vector index
Linked List of Text
Create-create separate Form nodes
for each Form 10K. Add summary.
Enhance-vector index of summary.
Connect-connect from Form to first
node in linked list. Then from each
chunk back to the Form Node.
Benefits:
● expand context of chunk with
summary text
● navigate from form to text
Neo4j Inc. All rights reserved 2024
34
Chunk
NEXT
Improve Context
cusip6: string
formId: string
summary: string
summaryEmbedding: float[]
vector index
Hierarchical Summary
Form
PART_OF
SECTION
Add Form 13
Neo4j Inc. All rights reserved 2024
35
Company
Manager
OWNS_STOCK_IN
Create-create Manager and Company
nodes
Enhance-full-text index of names
Connect-connect Manager nodes to
Company nodes through investments
Benefits:
● pattern-matching queries
● search names by text similarity
(Apple and Apple Inc)
rather than conceptual similarity
(Apple and Banana)
name: string
address: string
full-text index
shares: integer
value: float
name: string
address: string
full-text index
Structured Data
Company
Manager
OWNS_STOCK_IN
Address
L
O
C
A
T
E
D
_
A
T
L
O
C
A
T
E
D
_
A
T
Located at Address
Neo4j Inc. All rights reserved 2024
36
Create-create Address nodes
Enhance-geospatial index of address
Connect-connect Manager and
Company nodes to Address
Benefits:
● pattern-based location queries
● distance-based calculations,
search companies within radius or
bounding box
city: string
state: string
country: string
location: Point
geospatial index
Geospatial Search
Combine Graphs
Neo4j Inc. All rights reserved 2024
37
Connect-connect Company nodes to
the Form they filed
Benefits:
● expanded context for
vector-based search
● refine search results by location
● expanded pattern matches
Mixed Text & Data
Chunk
Company
FILED
Form
PART_OF
SECTION
Manager
OWNS_STOCK_IN
NEXT
Address
L
O
C
A
T
E
D
_
A
T
L
O
C
A
T
E
D
_
A
T
Create, Enhance, Connect SEC Financial Forms
Sections from a Form Form 10K Nodes Public Companies Management Firms Addresses
Source Form 10K json files (:Chunk) Form 13 CSV Form 13 CSV (:Company), (:Manager)
1. Create (:Chunk) (:Form) (:Company) (:Manager) (:Address)
2. Enhance Vector embedding Vector embedding Full-text index Full-text index Geospatial index
3. Connect (Chunk)
-[NEXT]->(Chunk)
(Chunk)
-[PART_OF]->(Form)
(Company)
-[FILED]->(Form)
(Manager)
-[OWNS_STOCK_IN]->(Company)
(Company|Manager)
-[LOCATED_AT]->(Address)
You can continue to grow the knowledge graph…
● cross-link Companies that mention each other
● add People, Places, Topics extracted from text (named entity recognition)
● add more Form data, or other related sources
● add User information to keep history, refine relevance and enable feedback
Neo4j Inc. All rights reserved 2024
38
Resources & Next Steps
Neo4j Inc. All rights reserved 2024
39
Code
github.com/neo4j-examples/sec-edgar-notebooks
Get Started with Neo4j -Aura Free
neo4j.com/cloud/aura-free/
GenAI Ecosystem & Free Learning Resources
neo4j.com/labs/genai-ecosystem/
graphacademy.neo4j.com/categories/llms/
Thank you!
andreas.kollegger@neo4j.com
Neo4j Inc. All rights reserved 2023
40

More Related Content

What's hot (20)

PDF
LLMs in Production: Tooling, Process, and Team Structure
Aggregage
 
PDF
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Neo4j
 
PDF
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j
 
PDF
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
MeetupDataScienceRoma
 
PDF
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
PDF
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Neo4j
 
PDF
Introduction to Neo4j
Neo4j
 
PDF
Neo4j 4 Overview
Neo4j
 
PDF
Large Language Models, Data & APIs - Integrating Generative AI Power into you...
NETUserGroupBern
 
PPTX
AzureOpenAI.pptx
Udaiappa Ramachandran
 
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
PDF
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PDF
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
PDF
Introduction to elasticsearch
pmanvi
 
PDF
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
PDF
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Neo4j
 
PDF
Write Faster SQL with Trino.pdf
Eric Xiao
 
PDF
Introducing Neo4j
Neo4j
 
PPTX
Azure Synapse Analytics Overview (r2)
James Serra
 
PDF
Training Series: Build APIs with Neo4j GraphQL Library
Neo4j
 
LLMs in Production: Tooling, Process, and Team Structure
Aggregage
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Neo4j
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
MeetupDataScienceRoma
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Neo4j
 
Introduction to Neo4j
Neo4j
 
Neo4j 4 Overview
Neo4j
 
Large Language Models, Data & APIs - Integrating Generative AI Power into you...
NETUserGroupBern
 
AzureOpenAI.pptx
Udaiappa Ramachandran
 
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Best Practice on using Azure OpenAI Service
Kumton Suttiraksiri
 
Introduction to elasticsearch
pmanvi
 
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Neo4j
 
Write Faster SQL with Trino.pdf
Eric Xiao
 
Introducing Neo4j
Neo4j
 
Azure Synapse Analytics Overview (r2)
James Serra
 
Training Series: Build APIs with Neo4j GraphQL Library
Neo4j
 

Similar to Neo4j: Data Engineering for RAG (retrieval augmented generation) (20)

PDF
Connecting the Dots for Information Discovery.pdf
Neo4j
 
PDF
YouTube Downloader v3.4.9 APK Download
blouch110kp
 
PDF
Wondershare UniConverter for MacOS Download
blouch120kp
 
PDF
Minitab Free crack Download (Latest 2025)
blouch136kp
 
PDF
TunesKit Video Repair 2.0.0.11 Free Download
blouch134kp
 
PPTX
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
Neo4j
 
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
PDF
Remote Desktop Manager Enterprise 2024.3.29
blouch111kp
 
PDF
LDPlayer Free Download (Latest version 2025)
alihamzakpa098
 
PDF
Apple Logic Pro X for MacOS Free Download
blouch139kp
 
PDF
Capcut Pro Crack For PC Latest 2025 Version
blouch81kp
 
PDF
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j
 
PPTX
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
Neo4j
 
PDF
Neo4j and Generative AI: New Frontiers in Data Intelligence
Neo4j
 
PPTX
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
Neo4j
 
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
PDF
Beyond Limits: How GraphRAG Revolutionises Data Interaction
Neo4j
 
PDF
Webinar - IA generativa e grafi Neo4j: RAG time!
Neo4j
 
PDF
Large Language Models ❤️ Knowledge Graphs - Michael Hunger
Zilliz
 
PDF
The Art of the Possible with Graphs
Neo4j
 
Connecting the Dots for Information Discovery.pdf
Neo4j
 
YouTube Downloader v3.4.9 APK Download
blouch110kp
 
Wondershare UniConverter for MacOS Download
blouch120kp
 
Minitab Free crack Download (Latest 2025)
blouch136kp
 
TunesKit Video Repair 2.0.0.11 Free Download
blouch134kp
 
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
Remote Desktop Manager Enterprise 2024.3.29
blouch111kp
 
LDPlayer Free Download (Latest version 2025)
alihamzakpa098
 
Apple Logic Pro X for MacOS Free Download
blouch139kp
 
Capcut Pro Crack For PC Latest 2025 Version
blouch81kp
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
Neo4j
 
Neo4j and Generative AI: New Frontiers in Data Intelligence
Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Beyond Limits: How GraphRAG Revolutionises Data Interaction
Neo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Neo4j
 
Large Language Models ❤️ Knowledge Graphs - Michael Hunger
Zilliz
 
The Art of the Possible with Graphs
Neo4j
 
Ad

More from Neo4j (20)

PDF
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
PPTX
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
PDF
Neo4j: The Art of the Possible with Graph
Neo4j
 
PDF
Smarter Knowledge Graphs For Public Sector
Neo4j
 
PDF
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
PDF
Démonstration Digital Twin Building Wire Management
Neo4j
 
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
PDF
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
PDF
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
PDF
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
PDF
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
PDF
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Neo4j
 
PDF
Neo4j Product update and new Aura Platform
Neo4j
 
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Neo4j
 
Neo4j Product update and new Aura Platform
Neo4j
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
July Patch Tuesday
Ivanti
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 

Neo4j: Data Engineering for RAG (retrieval augmented generation)

  • 1. Data Engineering for RAG ABK of Neo4j (Andreas Kollegger)
  • 2. Generative AI Neo4j Inc. All rights reserved 2024 2
  • 3. Generative AI ● Learns random sentences from random people ● Talks like a person but doesn't really understand what it's saying ● Occasionally speaks absolute nonsense ● Sensitive to question phrasing ● Answers reflect the person asking ● Can't explain or verify answers ● Limited to public "knowledge" Neo4j Inc. All rights reserved 2024 3
  • 4. Generative AI ● Learns random sentences from random people ● Talks like a person but doesn't really understand what it's saying ● Occasionally speaks absolute nonsense ● Sensitive to question phrasing ● Answers reflect the person asking ● Can't explain or verify answers ● Limited to public "knowledge" Neo4j Inc. All rights reserved 2024 4
  • 5. How do we integrate with the alien technology? Neo4j Inc. All rights reserved 2024 5
  • 6. Everything starts with practical work, using RAG… Neo4j Inc. All rights reserved 2024 6
  • 7. Retrieval Augmented Generation (RAG) RAG is a software design pattern for integrating GenAI Apps with custom data sources, like a database. Neo4j Inc. All rights reserved 2024 7
  • 8. A Generative AI application uses an LLM to provide responses to user prompts (aka ChatGPT) Neo4j Inc. All rights reserved 2024 8 8 User Prompt Complete Response GenAI Application LLM User Prompt Response
  • 9. RAG augments the LLM by intercepting a user's prompt, then making a query to a database, then using the query results as context for the user's prompt, creating a new prompt that is passed to the LLM for a complete, curated response Neo4j Inc. All rights reserved 2024 9 9 Database GenAI Application Complete Response User Prompt LLM User Prompt + Context Response User Prompt Context 1 2 1 2
  • 10. This sets up a knowledge stack… the user knows something about the question they're asking the application knows something about the user the database knows about particular information and data the LLM knows about whatever it found on the internet Neo4j Inc. All rights reserved 2024 10 10 User Knowledge App Knowledge Database Knowledge LLM Knowledge Knowledge Stack
  • 11. This sets up a knowledge stack… the user knows something about the question they're asking the application knows something about the user the database knows about particular information and data the LLM knows about whatever it found on the internet Neo4j Inc. All rights reserved 2024 11 11 User Knowledge App Knowledge Database Knowledge LLM Knowledge Knowledge Stack Knowledge you control, in the app and the database.
  • 12. Three Sources of Data for RAG Each with different access patterns, supporting different kinds of questions. Neo4j Inc. All rights reserved 2023 12
  • 13. Neo4j Inc. All rights reserved 2024 13 Pure Text
  • 14. Neo4j Inc. All rights reserved 2024 14 Pure Text Unstructured data in PDFs, plain text files, or images
  • 15. Information search: “What is Apple's primary business?” Answer with: Implicit knowledge derived from text. Neo4j Inc. All rights reserved 2024 15 Pure Text Unstructured data in PDFs, plain text files, or images
  • 16. Neo4j Inc. All rights reserved 2024 16 Pure Data Pure Text
  • 17. Neo4j Inc. All rights reserved 2024 17 Pure Data Structured data in a database Pure Text
  • 18. Neo4j Inc. All rights reserved 2024 18 Pure Data Structured data in a database Pure Text Information query: “How many iPhones did Apple sell this quarter?” Answer with: Explicit facts from a database query.
  • 19. Neo4j Inc. All rights reserved 2024 19 Pure Text Pure Data Mixed Text + Data
  • 20. Neo4j Inc. All rights reserved 2024 20 Pure Text Pure Data Mixed Text + Data Structured data together with long-form text
  • 21. Neo4j Inc. All rights reserved 2024 21 Pure Text Pure Data Mixed Text + Data Structured data together with long-form text Information discovery: “Which investors will be impacted by a chip shortage?” Answer with: Combined search and data query.
  • 22. Neo4j Inc. All rights reserved 2024 22 Pure Text Pure Data Mixed Text + Data
  • 23. Neo4j Inc. All rights reserved 2024 23 Pure Text Pure Data Mixed Text + Data A Knowledge Graph: Information architecture for data, organized using graph structures, which places data within context.
  • 24. Neo4j Inc. All rights reserved 2024 24 Pure Text Pure Data Mixed Text + Data A Knowledge Graph: Information architecture for data, organized using graph structures, which places data within context. Graph RAG: Supports multiple modes of information retrieval, including information search, information query, and information discovery.
  • 25. Neo4j Inc. All rights reserved 2024 25 Pure Text Pure Data Mixed Text + Data Vector Search Search + Pattern Matching Graph Queries Find relevant documents plus context for information search Expand context and rank the relevance for information discovery Directly query the knowledge graph for information query
  • 26. GenAI Example: SEC Edgar Financial Forms Neo4j Inc. All rights reserved 2024 26
  • 27. SEC Edgar Financial Data The EDGAR database provides free public access to company information, allowing research about public company financial information and operations through the filings they submit to the SEC. There are two forms that we'll look at today: 1. Form 10K-filings from publicly traded companies 2. Form 13 -filings from institutional investment management firms Neo4j Inc. All rights reserved 2024 27
  • 28. Data Modeling Strategy Start with a Minimum Viable Graph (MVG) Create, Enhance, Connect then repeat to grow the graph 1. Create-identify interesting information, create records 2. Enhance-supercharge the data by enhancing some dimension 3. Connect-connect information to expand context and reveal knowledge Neo4j Inc. All rights reserved 2024 28
  • 29. Form 10k Chunk Chunk Chunk Chunk Create -Form 10K text chunks exercitation ullamco laboris nisi ut aliquip enim ad minim veniam, quis nostrud incididunt ut labore et dolore magna aliqua. Ut adipiscing elit, sed do eiusmod tempor Lorem ipsum dolor sit amet, consectetur Chunk Neo4j Inc. All rights reserved 2024 29 2. Split Text 1. Source - Form 10K 3. Create Nodes
  • 30. Form 10k Chunk Chunk Chunk Chunk Enhance -Text with an embedding exercitation ullamco laboris nisi ut aliquip enim ad minim veniam, quis nostrud incididunt ut labore et dolore magna aliqua. Ut adipiscing elit, sed do eiusmod tempor Lorem ipsum dolor sit amet, consectetur [0.6,0.2,0.1,0.7] [0.5,0.2,0.1,0.7] [0.4,0.2,0.1,0.7] [0.3,0.2,0.1,0.5] [0.2,0.2,0.1,0.7] 1. Source - Chunks Chunk Neo4j Inc. All rights reserved 2024 Vector Index 30 4. Add Embedding
  • 31. Form 10k Chunk Chunk Chunk Chunk Connect -Connect chunks into a list exercitation ullamco laboris nisi ut aliquip enim ad minim veniam, quis nostrud incididunt ut labore et dolore magna aliqua. Ut adipiscing elit, sed do eiusmod tempor Lorem ipsum dolor sit amet, consectetur [0.6,0.2,0.1,0.7] [0.5,0.2,0.1,0.7] [0.4,0.2,0.1,0.7] [0.3,0.2,0.1,0.5] [0.2,0.2,0.1,0.7] Chunk NEXT 1. Connect Chunks Chunk Chunk Neo4j Inc. All rights reserved 2024 31
  • 32. Form 10k Chunk Chunk Chunk Chunk Create, Enhance, Connect Form 10K exercitation ullamco laboris nisi ut aliquip enim ad minim veniam, quis nostrud incididunt ut labore et dolore magna aliqua. Ut adipiscing elit, sed do eiusmod tempor Lorem ipsum dolor sit amet, consectetur 2. Split Text [0.6,0.2,0.1,0.7] [0.5,0.2,0.1,0.7] [0.4,0.2,0.1,0.7] [0.3,0.2,0.1,0.5] [0.2,0.2,0.1,0.7] 4. Add Embedding 1. Source - Form 10K 3. Create Nodes Chunk NEXT 5. Connect Chunk Chunk Extract Enhance Expand Neo4j Inc. All rights reserved 2024 32
  • 33. Benefits: ● vector similarity search to find relevant text ● expand context window with previous/next chunks ● enable paging through text Neo4j Inc. All rights reserved 2024 33 Chunk NEXT Minimum Viable Graph formId: string chunkId: string text: string textEmbedding: float[] vector index Linked List of Text
  • 34. Create-create separate Form nodes for each Form 10K. Add summary. Enhance-vector index of summary. Connect-connect from Form to first node in linked list. Then from each chunk back to the Form Node. Benefits: ● expand context of chunk with summary text ● navigate from form to text Neo4j Inc. All rights reserved 2024 34 Chunk NEXT Improve Context cusip6: string formId: string summary: string summaryEmbedding: float[] vector index Hierarchical Summary Form PART_OF SECTION
  • 35. Add Form 13 Neo4j Inc. All rights reserved 2024 35 Company Manager OWNS_STOCK_IN Create-create Manager and Company nodes Enhance-full-text index of names Connect-connect Manager nodes to Company nodes through investments Benefits: ● pattern-matching queries ● search names by text similarity (Apple and Apple Inc) rather than conceptual similarity (Apple and Banana) name: string address: string full-text index shares: integer value: float name: string address: string full-text index Structured Data
  • 36. Company Manager OWNS_STOCK_IN Address L O C A T E D _ A T L O C A T E D _ A T Located at Address Neo4j Inc. All rights reserved 2024 36 Create-create Address nodes Enhance-geospatial index of address Connect-connect Manager and Company nodes to Address Benefits: ● pattern-based location queries ● distance-based calculations, search companies within radius or bounding box city: string state: string country: string location: Point geospatial index Geospatial Search
  • 37. Combine Graphs Neo4j Inc. All rights reserved 2024 37 Connect-connect Company nodes to the Form they filed Benefits: ● expanded context for vector-based search ● refine search results by location ● expanded pattern matches Mixed Text & Data Chunk Company FILED Form PART_OF SECTION Manager OWNS_STOCK_IN NEXT Address L O C A T E D _ A T L O C A T E D _ A T
  • 38. Create, Enhance, Connect SEC Financial Forms Sections from a Form Form 10K Nodes Public Companies Management Firms Addresses Source Form 10K json files (:Chunk) Form 13 CSV Form 13 CSV (:Company), (:Manager) 1. Create (:Chunk) (:Form) (:Company) (:Manager) (:Address) 2. Enhance Vector embedding Vector embedding Full-text index Full-text index Geospatial index 3. Connect (Chunk) -[NEXT]->(Chunk) (Chunk) -[PART_OF]->(Form) (Company) -[FILED]->(Form) (Manager) -[OWNS_STOCK_IN]->(Company) (Company|Manager) -[LOCATED_AT]->(Address) You can continue to grow the knowledge graph… ● cross-link Companies that mention each other ● add People, Places, Topics extracted from text (named entity recognition) ● add more Form data, or other related sources ● add User information to keep history, refine relevance and enable feedback Neo4j Inc. All rights reserved 2024 38
  • 39. Resources & Next Steps Neo4j Inc. All rights reserved 2024 39 Code github.com/neo4j-examples/sec-edgar-notebooks Get Started with Neo4j -Aura Free neo4j.com/cloud/aura-free/ GenAI Ecosystem & Free Learning Resources neo4j.com/labs/genai-ecosystem/ graphacademy.neo4j.com/categories/llms/
  • 40. Thank you! [email protected] Neo4j Inc. All rights reserved 2023 40