Neo4j Graph DB & LLM.graphs & genAI introduction & cheatsheet.pdf

Véronique Gendner – e-tissage.net – August 2024 – v1
• Label Property Graph Databases (codes examples with Neo4j) 2
• Vector (semantic) search : finding elements by vector similarity 4
• Embeddings & Vector (semantic) search in a Label Property Graph DB 5
• Text generation with LLM 9
• What is Retrieval Augmented Generation (RAG) ? 12
• Graph RAG 13
• Graph RAG with Neo4j genAI plugin & APOC 14
• genAI with Neo4j Python module 15
• Vector (semantic) similarity search – with filters 17
• Hybrid vector – full-text indexes search 18
• Graph RAG 19
with an LLM orchestration platform (code examples with LangChain ) 20
• Graph anchored Vector stores 21
• Entities and relations extraction to build Knowledge Graphs 30
• Limitations 32
• Graph DB & LLM : Wrap up 33
• Resources 34
Neo4j Graph Database & LLM
graphs & genAI : introduction + cheat sheet
• With retrieval query 25
• Retriever 26
• Graph RAG 27
• Providing source of the generated text 29

2
Label Property Graph Databases
LPG DB
text text
text text
name text
name description
Graph Database
title
url
authors
The structure is a graph,
= stuff related to other stuff
Basic elements are
• Nodes with Labels
• Relations
• Properties
(on nodes and relations)

3
Cypher : graph DB query language
Label Property Graph DB are queried with the query language Cypher
that allows to write graph pattern made of node labels, relation types
and properties
( :Movie)-[:ACTED_IN]-( :Person)
:Movie :Person
title : The Matrix
role : Neo
ACTED_IN
m p
name : Keanu Reeves
born : 1964
MATCH (m:Movie)-[:ACTED_IN]-(p:Person { name : "Keanu Reeves" } )
RETURN m.title
How to get the movie Keanu Reeves acted in ?
An Introduction to graph query language Cypher by author

4
Vector search
often called semantic search
= finding elements through vector similarity
of the embedding = vector representation of the question, in a vector index
vector
encoding
vector
similarity
question
embeddingofthe
question result
vector index
Finding movies with a plot similar to the plot of « Toy Story »
A cowboy doll is
profoundly
threatened and
jealous when a new
spaceman figure
supplants him as top
toy in a boy’s room.
example from GraphAcademy course Neo4j & LLM Fundamentals

5
Graph Database
with embedding in node properties
A document is split in chunks of
text. Each chunk is represented
by a node, with the text in one
property and the corresponding
embedding in another property.
Encoding text properties of nodes as embeddings (= vectors)
MATCH (chunk:Chunk)
WHERE chunk.embedding IS NULL
WITH chunk, genai.vector.encode(
chunk.text,
"OpenAI",
{token: $openAiApiKey,
endpoint: $openAiEndpoint}) AS vector
CALL db.create.setNodeVectorProperty(
chunk,
"embedding",
vector)
text text
text text
name text
name
description
title
url
authors
embedding
embedding embedding
embedding
embedding
Doc: genai.vector.encode &
setNodeVectorProperty
Graph Database

6
Graph Database
with embedding in node property + vector index
A vector index of the embeddings
of text chunks is build.
CREATE VECTOR INDEX
'chunks_text_embedding' IF NOT EXISTS
FOR (c:Chunk) ON (c.embedding)
OPTIONS { indexConfig: {
'vector.dimensions': 1536,
'vector.similarity_function': 'cosine'
}}
Doc: vector indexes
Graph Database
text text
text text
name text
name
description
title
url
authors
embedding
embedding embedding
embedding
vector index
embedding
chunks_text_embedding

7
Vector (Semantic) Search in a graph DB
Vector similarity search between embeddings in the graph
Returns k=1 Chunk nodes text
property, for which the embedding is
most similar to the embedding of the
text (=description) property of the
node with Netapp as name
MATCH (org:Organisation
{name:'Netapp'})
CALL db.index.vector.queryNodes(
'chunks_text_embedding',
1,
org.embedding
) YIELD node, score
RETURN node.text
text text
text text
name text
name
description
title
url
authors
embedding
embedding
embedding
vector index
embedding
embedding
Doc db.index.vector.queryNodes
Netapp
chunks_
text_embedding
'score': 0.935633659362793,
'text': '>Item 1. Business Overview
NetApp, Inc. (NetApp, we, us or the Company) is a
global cloud-led, data-centric software company.
We were incorporated in 1992 and are
headquartered in San Jose, California. Building on
more than three decades of innovation, we give
customers the freedom to manage applications
and data across hybrid multicloud environments.
[…]
Graph Database

8
Vector (Semantic) Search in a graph DB
Vector similarity search with an external question
question =
Tell me about
Netapp
vector
similarity
WITH genai.vector.encode(
$question,
"OpenAI",
{ token: $openAiApiKey,
endpoint: $openAiEndpoint
}) AS question_embedding
CALL
db.index.vector.queryNodes(
'chunks_text_embedding',
1,
question_embedding
) YIELD node, score
RETURN node.text
question
embedding
Graph Database
text text
text text
name text
name
description
title
url
authors
embedding
embedding
vector index
vector
encoding
embedding
embedding
embedding
'score': 0.935633659362793,
'text': '>Item 1. Business Overview
NetApp, Inc. (NetApp, we, us or the
Company) is a global cloud-led, data-
centric software company. We were
incorporated in 1992 and are
headquartered in San Jose, California.
Building on more than three decades of
innovation, we give customers the
freedom to manage applications and data
across hybrid multicloud environments.
[…]
Example from course Knowledge Graphs for RAG by Andreas Kollegger
chunks_
text_embedding
Returns k=1 nodes text property,
for which the embedding of the
text property is most similar to the
embedding of the question

• Baseline generation
• Retrieval Augmented Generation (RAG)
• Graph RAG
Text generation with LLMs
genAI

10
Baseline text generation with LLM
As they generate the most probable sequence of words, given the embedding
of a question, LLMs actually encode 2 things :
• Information contained in the data it’s been trained upon
• Capacity to produce humain like sentences
Baseline text generation with LLM fails if the requested information is out of
the scope of training data, i.e. more specific, private information, or if it
concerns facts that happened after training - knowledge cutoff date.
vector
encoding
text
generation
embedding
ofthe
question
Who was elected
President of France
in 1981 ?
François Mitterrand
was elected President
of France in 1981

11
Text generation with context
If inputted with questions about a subject it has not been trained upon, like more recent or private
information, LLM fail to return a significant response.
but …
If provided with the required information, LLM capability to produce human like sentences can be used
to formulate the response.
what are the names
of my cousins ? []
given this context :
"My father's brother
has a daughter named
Julia, my mother's
sister has a daughter
named Claudia and a
son named Peter",
what are the names of
my cousins ?
Your cousins are
named Julia,
Claudia, and Peter.
vector
encoding
text
generation
embedding
ofthe
question
vector
encoding
text
generation
embedding
ofthe
question

12
Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation
consist in sending the question
together with context information
extracted from a specific (private /
updated) knowledge base,
as well as with a system prompt that
is instructions in plain text, about
the expected result.
question
what are the names of
my cousins ?
specific
(private / updated)
knowledge base
context
database
query
My father's brother has a
daughter named Julia,
my mother's sister has a
daughter named Claudia
and a son named Peter
+
+
Answer the following Question based on
the Context only and provide a
reference to where in the context you
found it. Only answer from the Context.
If you don't know the answer, say 'I
don't know'.
system
prompt
Your cousins are
named Julia, Claudia,
and Peter.
vector encoding
&
text generation
generation prompt

13
Graph RAG
vector encoding
&
text generation
question
+
+
instructions in plain text,
about the expected result
context
Concatenation of text
properties of nodes
and relations
extracted from the db
Find a movie with a plot similar
to the plot of « Toy Story »
The little
Rascal, is a movie
with a plot similar
to « Toy Story »
vector
index
vector
encoding
vector
similarity
search
embedding
ofthe
question
* Other methods than vector similarity search can be
used, as we will see in the LangChain section
In Graph RAG, the context sent to the
LLM for better grounded text
generation is extracted from a
graph structured DB.
LLM are typically* used twice :
• for db extraction by vector
similarity with an embedding of
the question
• for text generation of the answer
system
prompt
Using a graph DB to provide LLM with
context allows to extract fine tuned
contextual information about what
the vector similarity search produced.
specific
(private / updated)
Graph DB
generation prompt

14
Graph RAG
with Neo4j genAI & APOC plugins
Here is a basic code example to perform Graph
RAG with Neo4j and the genAI & APOC plugins.
If the movie returned by vector similarity search
with the question is an episode of a TV series,
we can use a graph pattern to get related
information about the serie and the plot of all
other episodes to pass them in the context
provided to the generative LLM.
example adapted from GraphRAG in (Almost) Pure Cypher by Christoffer Bergman
vector encoding
&
text generation
question
What is the name of Elly
Conway’s cat?
+
Alfie
+
Answer the following Question based
on the Context only and provide a
reference to where in the context you
found it. Only answer from the
Context. If you don't know the
answer, say 'I don't know‘...
context
Concatenation of
extracted titles and
synopsis
system
prompt
[…] fictional espionage novels […],
quiet evenings at home become a
thing of the past. Accompanied by her
cat Alfie and Aiden, a cat-allergic spy …
generation prompt
* Requires APOC extended
*
plot
embedding
:Movie
title
description embedding
:Serie
title
vector
index
vector
encoding
vector
similarity
search
embedding
of the
question
plot
embedding
:Movie
title
EPISODE
_OF
EPISODE
_OF
Graph Database

Neo4j genAI Python module
API in beta subject to changes
The LangChain integration that we will see next currently seems
in a more stable state than the Python module
import neo4j-genai

16
Retriever
Retrievers are modules designed to retrieve documents from a vector store.
They can use a vector store only or be combined with other retrieval methods.
node
score
question
embedder
Graph Database
index_name
return_
properties
from neo4j_genai.embeddings.openai import OpenAIEmbeddings
from neo4j_genai.retrievers import VectorRetriever
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")
retriever = VectorRetriever(
driver, index_name="moviePlotsEmbedding",
embedder=embedder, return_properties=["title", "plot"] )
question = "A movie about the famous sinking of the Titanic"
result = retriever.search(query_text=question, top_k=3)
Example from The Neo4j GenAI Package for Python - Getting started with retrieval and GraphRAG by Will Tai
A movie about
the famous
sinking of the
Titanic
result =
{'title': 'Titanic',
'plot': 'An unhappy married couple deal with
their problems on board the ill-fated ship..'},
metadata={'score': 0.9450 } },
{'title': 'Night to Remember, A',
'plot': 'An account of the ill-fated maiden
voyage of RMS Titanic in 1912.'},
'plot': 'A seventeen-year-old aristocrat falls in
love with a kind, but poor artist aboard the
luxurious, ill-fated R.M.S. Titanic.'},
metadata={'score': 0.9422 }}
driver
retriever
VectorRetriever()
Retriever Configuration
Doc :

17
Search in graph anchored Retriever
Vector similarity + metadata filtering
When searching with a retriever, it is possible to add filters on properties.
If a filter on the property country is added to the previous search, the UK produced movie
A night to remember will come first. USA produced movies do not show in the result
* For Neo4j > 5.18
*
node
score
question
Graph Database
return_
properties
A movie about
the famous
sinking of the
Titanic
result =
retriever.search(query_text=
question, top_k=3,
filters={"country":"UK"})
result =
'plot': 'An account of the ill-fated maiden
voyage of RMS Titanic in 1912.'},
metadata={'score': 0.9428} },
{'title': 'Deep Water',
'plot': 'A documentary about the disastrous
1968 round-the-world yacht race.'},
metadata={'score': 0. 9302}},
{'title': 'Pandora and the Flying Dutchman',
'plot': "A seductive woman falls in love with a
mysterious ship's captain."}‘,
metadata={'score': 0. 9161}}
retriever
VectorRetriever()
Metadata filtering in LangChain
Produced
in UK
Doc : Metadata filtering
filters =
country:
UK
embedder
index_name
driver

18
Hybrid Retriever:
with combined vector and full-text indexes
Both with the Python module and in LangChain, you can make an hybrid search that will combine the
results of a vector similarity search in a vector index with the results of a full-text index search. All results are
combined and returned in decreasing value of their respective score. (the full-text index score is normalized see query )
HybridRetriever
Doc : Neo4jVector search_type="hybrid“
node
score
question
Graph Database
return_
properties
A movie
about the
famous
sinking of
the Titanic
retriever = VectorRetriever( driver,
vector_index_name="moviePlotsEmbedding",
fulltext_index_name=“movieTitleFulltext“,
embedder=embedder,
return_properties=["title", "plot"] )
result =
retriever.search(query_text=question,
top_k=3)
result =
'plot': 'An unhappy married couple deal […]'},
metadata={'score': 1 } },
'plot': 'A seventeen-year-old aristocrat falls […].'},
metadata={'score': 0.9422 }},
'plot': 'An account of the ill-fated maiden[…]'},
retriever
VectorRetriever()
node
score/max
reorder
by score
driver
embedder
vector_index_name
fulltext_index_name

19
Graph RAG with Python
node
score
embedder
Graph Database
index_name
properties
relations
from neo4j_genai.llm import OpenAILLM
from neo4j_genai.generation import GraphRAG
# Note: the OPENAI_API_KEY must be in the env vars
genLLM = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
rag = GraphRAG(retriever=retriever, llm=genLLM,
[prompt_template=prompt_template ])
response = rag.search(query=question, retriever_config={"top_k": 5})
Response =
A movie about the famous
sinking of the Titanic is
Titanic, which tells the
story of a seventeen-year-
old aristocrat who falls in
love with a kind, but poor
artist aboard the luxurious,
ill-fated R.M.S. Titanic.
Another movie on the
same topic is "Night to
Remember, A," which
provides an account of the
ill-fated maiden voyage of
RMS Titanic in 1912.
Example from The Neo4j GenAI Package for Python - Getting started with retrieval and GraphRAG by Will Tai
Output according
to retriever used
Vector
similarity
search
+
+
system
prompt
context
genLLM
vector
encoding
&
text
generation
{'title': 'Titanic', 'plot': 'An unhappy married
couple deal with their problems on board the
ill-fated ship.'}, metadata={'score': 0.945),
{'title': 'Night to Remember, A', 'plot': 'An
account of the ill-fated maiden voyage of RMS
Titanic in 1912.'}, metadata={'score': 0.9428),
{'title': 'Titanic', 'plot': 'A seventeen-year-old
aristocrat falls in love with a kind, but poor
artist aboard the luxurious, ill-fated R.M.S.
Titanic.'}, metadata={'score': 0.9422}
to adjust default
see doc
question
A movie about the famous
sinking of the Titanic
generation prompt
GraphRAG Configuration
Doc :
driver

Orchestration platforms are used to chain several components with LLM :
splitting text in chunks, deciding between different processing pipelines (tools)
according to question, formatting output as well as processing involving LLM :
retrievers, building prompt, text or image generation, etc.
With
an LLM orchestration platform
possible orchestration platforms:
code examples in this document are with LangChain
See references for Llamaindex p. 32

21
In LangChain, Neo4jVector() can be
used to create a vector store from a
vector index existing in the graph DB.
It is also a driver to the graph DB.
Let’s see 3 of the different ways* to
populate such a vector store :
.from_existing_index
.from_existing_graph
.from_documents
Graph anchored vector store
Neo4jVector.from_existing_index
from langchain_community.vectorstores
import Neo4jVector
NeoVect =
Neo4jVector.from_existing_index(
OpenAIEmbeddings
(openai_api_key=OPENAI_API_KEY),
url=url,
username=username,
password=password,
index_name=index_name,
)
text text
text text
name text
name
description
title
url
authors
embedding
embedding embedding
embedding
vector index
embedding
index_name
Graph Database
LLM
Orchestration platform
OpenAIEmbeddings()
Neo4jVector.from_existing_index
* The different ways to populate a Neo4jVector store
Doc :

22
In LangChain, you can also create a vector store
from a graph DB by specifying a node_label and
the list of text_node_properties that will be
concatenated to create an embedding. It will be
set to the embedding_node_property.
The corresponding index_name will be created
in the DB.
embedding specifies the module that calculates
embeddings
Neo4jVector.from_existing_graph
name
name
text
description
import Neo4jVector
NeoVect =
Neo4jVector.from_existing_graph(
embedding= OpenAIEmbeddings(
openai_api_key=OPENAI_API_KEY),
url=NEO4J_URI,
username=NEO4J_USERNAME,
password=NEO4J_PASSWORD,
index_name="Organisation_vector_index",
node_label="Organisation",
text_node_properties=["name","text"]
,
embedding_node_property="embedding")
text
description
name +
vector index
index_name
name text
description
+
+
name
name
LLM
OpenAIEmbeddings()
Neo4jVector.from_existing_graph
Doc :
Graph Database

23
Neo4jVector.from_documents
From a LangChain created Document object, Neo4jVector.from_documents
deals with this whole process :
• Creates chunks nodes with node_label
• Creates embedding for text_node_property and sets corresponding node property
• Creates vector index with index_name
text text
text text
embedding
embedding embedding
vector index
embedding
OpenAIEmbeddings()
.neo4j_vector import Neo4jVector
from langchain_community.document_loaders
import TextLoader
from langchain_core.documents
import Document
from langchain_text_splitters
import CharacterTextSplitter
loader = TextLoader("state_of_the_union.txt")
stateOfUnion = loader.load()
text_splitter = CharacterTextSplitter(
chunk_size=500, chunk_overlap=0)
chunks = text_splitter.split_documents(
stateOfUnion)
NeoVect = Neo4jVector.from_documents(
chunks,
OpenAIEmbeddings(
openai_api_key=OPENAI_API_KEY),
url=NEO4J_URI, username=NEO4J_USERNAME,
password=NEO4J_PASSWORD,
index_name="vector",
node_label="Chunk",
text_node_property="text",
embedding_node_property="embedding")
LLM
index_name
:Chunk :Chunk
:Chunk
:Chunk
Neo4jVector.from_documents
Doc :
State of the Union.txt
Graph Database

24
Search in graph anchored vector store
vector similarity
Here is how a simple similarity search
(like we have done p. 7 in the graph DB)
is done on a LangChain graph
anchored vector store.
It returns the similarity score, node
properties as metadata and the text
corresponding to the used embedding.
store.similarity_search(
"Tell me about Netapp", k=1)
Tell me about Netapp
vector
similarity
question
embedding
text text text
name text
name
description
title
url
authors
embedding
embedding
vector index
vector
encoding
embedding
embedding
LLM
OpenAIEmbeddings()
store
metadata={'cik': '1002047', 'source':
'https://www.sec.gov/Archives/[…]-index.htm', 'formId':
'0000950170-23-027948', 'names': ['Netapp
Inc', 'NETAPP INC'], 'f10kItem': 'item1',
'chunkId': '0000950170-23-027948-item1-
chunk0000', 'cusip6': '64110D', 'chunkSeqId':
0},
page_content='text:'>Item 1. Business
Overview
NetApp, Inc. (NetApp, we, us or the
Company) is a global cloud-led, data-centric
software company. We were incorporated in
1992 and are headquartered in San Jose,
California. […]
Neo4jVector.similarity_search
Doc :
Graph Database

25
Vector store with retrieval query
A Neo4jVector store can articulate the
result of the vector similarity search with a
Cypher retrieval query in the graph DB.
This will provide context of the result node
(like we did with just with Neo4j & APOC p.14)
retrieval_query =
MATCH window=(:Chunk)-[:NEXT]->
(node)-[:NEXT]->(:Chunk)
WITH node, score, window
ORDER BY length(window)DESC LIMIT 1
WITH nodes(window) as chunkList,
node, score
UNWIND chunkList as chunkRows
WITH collect(chunkRows.text) as
textList, node, score
RETURN apoc.text.join(textList,
" n ") as text, score, node
{.source} AS metadata
store =
Neo4jVector.from_existing_index(
[…],
retrieval_query=retrieval_query )
vector
similarity
question
embedding
textA text textB
name text
name
description
title
url
authors
embedding
vector index
vector
encoding
embedding
embedding
LLM
OpenAIEmbeddings()
store
embedding
NEXT NEXT
+ +
Simplified from course
Knowledge Graphs for RAG by Andreas Kollegger
store.similarity_search(
"Tell me about Netapp", k=1)
Tell me about Netapp
metadata={'source':
'https://www.sec.gov/[…]index.htm'},
page_content='text:'>Item 1. Business Overview
NetApp, Inc. […] textA >Item 1A Risk Factors 14
textB […]
result :
VectorCypherRetriever
retrieval_query
Graph Database

26
Turning vector store to retriever
On orchestration platforms like LangChain, retrievers are components that can be integrated in
a pipeline of processes (chain), to retrieve documents.
Retrievers are often based on a vector store.
retriever = store.as_retriever()
retriever.invoke("What did the president say about Justice Breyer")
previous
component
next
component
score
text
metadata
[ Document(
metadata={'source': 'state_of_the_union.txt#66'},
page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. […] One of
our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.'),
Document(metadata={'source': 'state_of_the_union.txt#65'},
page_content=‘[…], I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and
retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.'),
Document(metadata={'source': 'state_of_the_union.txt#67'},
page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus
builder. Since she’s been nominated, she’s received a broad range of […].') ]

27
Graph RAG with LangChain & Neo4j
Context search by entity extraction & full-text index
With a Graph DB
backed retriever,
context extraction to
feed the text generation
LLM can be done by
vector similarity search
but also with a
combination of vector
similarity search and
full-text index search
Example from Enhancing the Accuracy of RAG Applications With Knowledge Graphs by Tomaz Bratanic, see code in
entity extraction
Graph Database
Elizabeth I
belonged to
the House of
Tudor
Vector
similarity
search
+
+
contexte
vector
encoding
&
text
generation
Elizabeth I (7 September 1533 – 24 March 1603) was Queen of
England and Ireland from 17 November 1558 until her death in
1603. She was the last monarch of the House of Tudor.
Elizabeth was the only surviving child of Henry VIII and his second
wife, Anne Boleyn. When Elizabeth was two years old, her
parents' marriage was annulled, , her mother was executed, and
Elizabeth was declared illegitimate. Henry restored her to […]
Elizabeth I
fulltext index
Full-text
Lucene search […]
Elizabeth I - PARENT -> Anne Boleyn
Elizabeth I - PARENT -> Henry Viii
Elizabeth I - MEMBER -> House Of Tudor
Elizabeth I - RULER -> Ireland
Elizabeth I - RULER -> England
Queen Elizabeth I - ARTIST -> Isaac Oliver
Queen Elizabeth I - APPOINTED_OFFICIAL -> George Gower
Queen Elizabeth I - FEATURES -> English Royal Portraits
Queen Elizabeth I - FEATURES -> Panel Paintings
[…]
MATCH (node)-[r:!MENTIONS]->(neighbor)
RETURN node.id + ' - ' + type(r) + ' -> '
+ neighbor.id AS output
node
Graph pattern search with Cypher
system
prompt
question
Which house did Elizabeth I belong to ?
generation prompt

28
Context search by Cypher generation
With a Graph DB backed
retriever, context
extraction can also be
done by using an LLM to
generate a Cypher query
Graph Database
question
What movies did
Carrie-Anne Moss act in ?
prompt
Graph pattern search with Cypher
You are an expert Neo4j Developer translating user questions
into Cypher to answer questions about movies and provide
recommendations. Convert the user's question based on the
schema. Schema: {schema} Question: {question}
+
see p 32 about limitations of this approach
+
MATCH (p:Person {name:" Carrie-Anne Moss"})
-[:ACTED_IN]-(m:Movie)
RETURN m.title
+
system
prompt
contexte
Carrie-Anne Moss
acted in
The Matrix
Revolutions, The
Matrix Reloaded,
and The Matrix.
vector
encoding
&
text
generation
[{'m.title': 'The Matrix
Revolutions'}, {'m.title': 'The Matrix
Reloaded'}, {'m.title': 'The Matrix'}]
Text2CypherRetriever
GraphCypherQAChain
generation prompt

29
Providing source of information
Since nodes of a graph db, allow to
associated any property to
embeddings of chunks of text, it is
possible to provide the name of the
source used to generate the answer
RetrievalQAWithSourcesChain
Example adapted from doc :
embedder
Graph Database
index_name
{'answer': 'The president
honored Justice Stephen
Breyer for his service to
the country and mentioned
his retirement from the
United States Supreme
Court.',
'sources': 'chunk#65' }
vector
similarity
search
+
+
system
prompt
context
genLLM
vector
encoding
&
text
generation
{'source': 'state_of_the_union.txt#66',
score: 0.9248, page_content=[…] One of our
nation’s top legal minds, who will continue
Justice Breyer’s legacy of excellence.‘},
{'source': 'state_of_the_union.txt#65'}, score:
0.9113, page_content=‘[…] Tonight, I’d like to
honor someone who has dedicated his life to
serve this country: Justice Stephen Breyer—an
Army veteran, […]’}
to adjust default
see doc
question
What did the president
say about Justice Breyer ?
generation prompt
driver
State of the Union.txt
text
embedding
:Chunk
source:
chunk65
[…] […]
from langchain.chains import RetrievalQAWithSourcesChain
from langchain_openai import ChatOpenAI
genLLM=ChatOpenAI(temperature=0,)
chain = RetrievalQAWithSourcesChain.from_chain_type(
genLLM, chain_type="stuff",
retriever=retriever)
chain.invoke(
{"question": "What did the president say about Justice Breyer"})

30
Entities and relations extraction
to build Knowledge Graphs
Apart from adding subsymbolic search capabilities (= semantic vector search), LLM are used to
extract entities and relation from plain text*
We’ve seen that entity extraction can be used to search the Graph DB with a full-text index.
It can also be used to build a graph DB.
*Usually called « unstructured data », which actually means not formally structured.
(Literature and linguistics study structures of plain texts)
You can try entities and relations extraction in no code with the LLM graph builder app.
Parameters adjustments are currently limited though.
Text Entities & Relationships Knowledge Graph
* Although the term Knowledge Graph (KG) is currently often used in a broader sense, I prefer to reserve it to a kind of
graph, in which entities and relations have been organized by humans. In my opinion, automatic extraction (= calculation)
of entities & relations can be a useful tool to build KG, but it does not produce a KG by itself: it takes human adjustments to
make sense of a graph and turn it into a KG.
Elizabeth I was
Queen of England
and Ireland from 17
November 1558 until
her death in 1603
(id='Elizabeth I', type='Person'),
(id='Queen Of England And Ireland', type='Position'),
(id='17 November 1558', type='Date'),
(id='1603', type='Date')]
('Elizabeth I')-['HELD_POSITION_FROM‘]->('Queen Of
England And Ireland’),
('Elizabeth I‘) –['DIED‘]->('1603'),
('Elizabeth I‘)-['STARTED_POSITION‘]->('17 Nov 1558’)
*

31
Entities and relations extraction
with LLMGraphTransformer
With LangChain LLMGraphTransformer you can
adjust how entities and relations are extracted,
for example, by specifying allowed_nodes
(and/or allowed_relationships), to help prevent over
generation,
LLMGraphTransformer
based on ChatOpenAI.with_structured_output( )
from langchain_experimental.graph_transformers
import LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
# Prompt used by LLMGraphTransformer is tuned
for Gpt4
gpt4 = ChatOpenAI(temperature=0,
model_name="gpt-4",
openai_api_key=OPENAI_API_KEY)
text="Elizabeth I was Queen of England and
Ireland from 17 November 1558 until her death
in 1603"
documents = [Document(page_content=text)]
extraction = LLMGraphTransformer(llm=gpt4,
allowed_nodes=["Person", "Location", "Date"])
graph_docs =
extraction.convert_to_graph_documents(documents
)
print(f"Nodes: {graph_docs[0].nodes}")
print(f"Relationships:
{graph_docs[0].relationships}")
Elizabeth I was Queen of England and
Ireland from 17 November 1558 until her
death in 1603
Doc :

32
Limitations
• LLM results are stochastics. This means different run do NOT get you the exact
same results
• Given the previous limitation, Cypher generation often is inconsistent. There are
attempts to correct generated query with the db schema, but it only works if the
user knows what is in db and how to ask, so I have not yet seen a real life use case
where Cypher generation seems to makes more sense than parameterized queries.
• Entities / relation extraction with LLM tend to over generate, you need to adjust.
Human work still is required to obtain a valuable Knowledge Graph
• LLMGraphTransformer for entities/relations extractions (Neo4j/LangChain) is
optimized for GTP-4 that has more expensive token price
• Neo4j vector Indexes limitations & known issues: https://neo4j.com/docs/cypher-
manual/current/indexes/semantic-indexes/vector-indexes/#limitations-and-issues

33
Graph DB & LLM : Wrap up
• The main benefits to combine Graph DB with LLM are
– Give LLM updated / private information extracted by
any or a combination of 3 possible kinds of search :
• keyword search or Lucene full-text search on nodes and relations properties,
• symbolic (= explicit) relations by graph pattern matching,
• subsymbolic (= non deterministic vector similarity) relations.
Provided with more accurate context, LLM deliver more accurate answers
– Provide the information source of the generated answer
• LLM can also be used to extract entities and relations from plain text.
• Experiments are underway for graph query generation from text with LLM.
Finding information by
combining keyword search, symbolic and subsymbolic relations
seems like a very promising aspect of the LPG DB & LLM wedding !

General & introductions
• GraphAcademy course - Introduction to Vector Indexes and
Unstructured Data
• Neo4j Cypher manual - Vector indexes
• GraphAcademy course - Neo4j & LLM Fundamentals
• LangChain documentation - Neo4j Vector store
• Neo4j genAI Python package by Will Tai
• Building RAG Applications With the Neo4j GenAI Stack: A
Comprehensive Guide by Yu Fanghua
Code examples
• Tomaz Bratanic github repo of Jupyter Notebooks
https://github.com/tomasonjo/blogs supporting his blog
posts https://bratanic-tomaz.medium.com/
• genAI workshop : vector search to content generation
Yeah, the usecase is about spamming people to get them to buy more
stuff, but well, get the technics and then do what you want with it !
Further readings
• Research Trends for the Interplay between Large Language
Models and Knowledge Graphs, Khorashadizadeh & al, June
2024
• Paco Nathan’s collection of arXiv articles about graph & LLM
LangChain Integration
• LangChain Neo4j Starter Kit by Jason Koo
• Using a Knowledge Graph to implement a DevOps RAG
application by Tomaz Bratanic
• how to use LangChain's LLMGraphTransformer
with CassandraGraphStore
Llama index integration
• Multimodal RAG pipeline with LlamaIndex and Neo4j
• LlamaIndex Webinar: Advanced RAG with Knowledge Graphs
(with Tomaz from Neo4j)
LLM costs
• Great overview of different LLM costs and how to calculate
them, by Joanna Stoffregen
NB: cost currently evolve very quickly
Icons credits
• Chatbot by SUBAIDA
• Prompt from article by Akash Verma
• Retriever adapted from Dog fetching Newspaper by Gan
Khoon Lay
• Vector store adapted from medical store by Vectors Point
• Other icons by Véronique Gendner – e-tissage.net
Courses & Tutorials Videos Articles Technical documentation
Jupyter NoteBook

Neo4j Graph DB & LLM.graphs & genAI introduction & cheatsheet.pdf

Recommended

More Related Content

What's hot (20)

Similar to Neo4j Graph DB & LLM.graphs & genAI introduction & cheatsheet.pdf (20)

Recently uploaded (20)

Neo4j Graph DB & LLM.graphs & genAI introduction & cheatsheet.pdf