0% found this document useful (0 votes)

27 views

Let Your Graph Do The Talking: Encoding Structured Data For Llms

Uploaded by

kke89brnib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Let Your Graph Do The Talking: Encoding Structured Data For Llms

Uploaded by

kke89brnib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Bryan Perozzi 1 Bahare Fatemi 1 Dustin Zelle 1 Anton Tsitsulin 1

Mehran Kazemi 1 Rami Al-Rfou 2 Jonathan Halcrow 1

Abstract Answer Answer

How can we best encode structured data into
sequential form for use in large language mod-
arXiv:2402.05862v1 [cs.LG] 8 Feb 2024

LLM LLM
els (LLMs)? In this work, we introduce a
parameter-efficient method to explicitly represent
structured data for LLMs. Our method, Graph-
Token, learns an encoding function to extend Text(G) GraphToken(G)

prompts with explicit structured information. Un-

like other work which focuses on limited domains Is there a Is there a
cycle in cycle in
(e.g., knowledge graph representation), our work this
this
is the first effort focused on the general encoding graph? graph?
of structured data to be used for various reasoning G Q G Q
tasks. We show that explicitly representing the (a) (b)
graph structure allows significant improvements
to graph reasoning tasks. Specifically, we see Figure 1. Graph encoding options for a frozen LLM. a) Fixed
across the board improvements - up to 73% points encoding, e.g., (Fatemi et al., 2024; Wang et al., 2023b; Stechly
- on node, edge and, graph-level tasks from the et al., 2023), b) This work proposes using GraphToken, a learned
GraphQA benchmark. graph prompt function to explicitly encode graphs in a parameter
efficient way.

1. Introduction
There has been an explosion of recent excitement around with new and supporting information, they are capable of
using LLMs (Vaswani et al., 2017; Devlin et al., 2018; Rad- adapting their parametric beliefs to effectively incorporate
ford et al., 2018; Raffel et al., 2020; Brown et al., 2020; new evidence.
Touvron et al., 2023; Zhao et al., 2023) to represent, pro- An automatic way to enrich the input context of an LLM
cess, and analyze textual data. These models typically take with factual and fresh information is through Retrieval Aug-
sequential text as their input but recent work has extended mented Generation (RAG) (Khandelwal et al., 2019; Guu
inputs to spatial and temporal modalities (e.g., image (Chen et al., 2020). RAG works by augmenting the prompt with
et al., 2022) and video (Arnab et al., 2021)). additional relevant, factual and fresh information. Sources
Despite their success, current realizations of LLMs have for RAG might include web searches or private databases.
noticeable problems – including a tendency to generate Often this information is in the form of structured data –
outputs which are untrue or unsupported by their prompt, data that has complex dependencies between different, dis-
commonly referred to as hallucinations (Wang et al., 2023a). crete parts of the whole. For example, private relational
Another intimately related issue is the problem of freshness, databases, social networks, or molecules all have relational
where the knowledge required to answer a query exists information between their discrete data items.
only after an LLM’s training date (Vu et al., 2023). One Structured data is ubiquitous in the real world – it surrounds
mitigation for these problems is through the enrichment our daily lives – and understanding how to represent this
of the prompt with additional factual and fresh data. As data optimally for its inclusion in LLMs is a crucial research
Kadavath et al. (2022) showed, when LLMs are supplied question. The predominant mode of encoding structured
1 data for LLMs is to use various types of hand-crafted, text-
Google Research 2 Waymo Research.
Correspondence to: Bryan Perozzi <[email protected]>. based serialization (Guo et al., 2023; Wang et al., 2023b;
Stechly et al., 2023) (See Figure 1 (a)). This approach
can impose significant decoding complexity for the lan-

1
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

guage model: from any text description, the model must previous ones. While earlier models were mainly based on
first correctly decode and understand the structure before it N-gram models (Jurafsky, 2021), newer models adopted
can utilize the information. Recently, Fatemi et al. (2024) neural approaches with the advent of distributed word repre-
demonstrated that pure text representations of structured sentations (Bengio et al., 2000; Mikolov et al., 2013). The
data are insufficient for graph reasoning with LLMs. They increased power offered by the neural language models and
show that LLMs are not able to utilize structure efficiently the increase in model and dataset sizes has led to a new
when posed with common reasoning tasks that are easily learning paradigm where large language models (LLMs) are
answered by classical graph algorithms. This highlights pre-trained in an unsupervised way on massive amounts of
the need to explore better and more expressive ways of textual data and are used as base (foundation) models (De-
representing structured data to an LLM. vlin et al., 2018; Radford et al., 2019). For each downstream
application, the base model is fine-tuned on small amounts
In this paper, we propose GraphToken (Figure 1 (b)), a
of task-specific data to adapt the model to the task.
parameter-efficient method for representing structured data
for LLMs. Pre-training LLMs on text corpora closely re-
lated to the desired reasoning task can enhance performance,
but it can be computationally expensive, particularly for
larger models. Additionally, fine-tuning requires domain- Parameter-Efficient Fine-Tuning: With the rapid growth
specific data and human expertise, further increasing costs. in the number of parameters for the state-of-the-art LLMs
Inspired by recent advancements in parameter-efficient fine- (Achiam et al., 2023; Team et al., 2023) fine-tuning for each
tuning (Lester et al., 2021; Xu et al., 2023), our method, downstream task has become prohibitively expensive in both
GraphToken, learns an encoding function that generates time and resources. The goal of parameter-efficient fine-
fine-tuned soft-token prompts. The soft-token prompt ex- tuning (PEFT) (Xu et al., 2023) is to adapt models to new
tends a textual prompt with explicit GraphToken encoded tasks by updating only a small number of (possibly new)
structural information, allowing us to train only a trivial parameters. There are a few dominant PEFT approaches:
number of GraphToken parameters when compared to the • Adapter-based approaches (Houlsby et al., 2019; He
total LLM parameter budget. et al., 2021) hold the LLM parameters frozen and add
Our work is the first to develop parameter-efficient encoders new trainable parameters to various parts of the model,
specifically for general reasoning tasks on structured data. with the main differentiating factor between different
We demonstrate that explicitly representing structure leads approaches being where the adapter parameters are
to significant improvement on the comprehensive GraphQA added.
benchmark (Fatemi et al., 2024). • LoRA and its variants (Hu et al., 2021; Edalati et al.,
2022; Valipour et al., 2022) similarly hold the LLM
Our Contributions. We propose the following innovations: parameters frozen and add new trainable parameters,
• GraphToken, a novel parameter-efficient encoder for however these trainable parameters are added to the
structured data inclusion in LLMs. frozen LLM parameters such that the fine-tuned LLM
• Extensive experiments on various graph reasoning is identical in architecture to the initial LLM, but with
tasks showing that our method significantly improves only those added parameters update.
LLM capabilities. • Partial fine-tuning and partial masking approaches
• Analysis demonstrating that the GraphToken encoder (Zhao et al., 2020; Zaken et al., 2021) only fine-tune
generalizes to both unseen tasks and graphs. or mask a subset of the LLM parameters – no new
parameters are introduced.
• Finally, soft-prompt approaches (Li & Liang, 2021;
2. Background Lester et al., 2021) prepend tokens with learnable pa-
We introduce the related work in LLMs, prompting methods, rameters to the beginning of the LLM input or to the
Graph Neural Networks (GNNs), graph encoders, and graph beginning of every LLM layer – like adapter-based and
models combined with LLMs. LoRA approaches, they hold the actual LLM parame-
ters frozen.
2.1. Large Language Models Our work falls under the umbrella of soft-prompt ap-
Pre-Trained Large Language Models (LLMs): Lan- proaches but can be extended to other PEFT approaches
guage models (Rosenfeld, 2000; Zhao et al., 2023) are prob- as well. Most relevant to our work is the work of Levine
abilistic models that assign probabilities to sequences of et al. (2022), where the input is fed into a separate trainable
words by breaking the probability of a sequence into the neural network to produce the soft-prompt. We extend this
product of the probabilities of the next tokens given the to encoding structured data input via a GNN to produce the
LLM soft-prompt.

2
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

2.2. Graph Encoding with Neural Networks embedding space as “graph tokens.”
The field of graph representation learning (Chami et al., To maintain the reasoning and language capabilities of the
2022) seeks ways to represent structured data (i.e., discrete LLM, we freeze its parameters and teach the graph encoder
and relation) in a continuous domain – typically for use in to align its output representations with the LLM embedding
a downstream machine learning task. While seminal work space: we learn only those parameters of the graph encoder
like DeepWalk (Perozzi et al., 2014) popularized the node during the training process. This reduces computational
embedding problem, later work utilized GNNs to generalize requirements significantly (graph encoder parameters con-
and learn representations of the entire graph (graph embed- stitute a trivial sum compared to the LLM). During our tests,
dings). Many approaches learning graph representations the LLM is prompted with the output of the graph encoder
(node or graph embeddings) have followed (Tsitsulin et al., and a task about the graph, for example: ‘Does this graph
2018; Xie et al., 2022). contain a cycle?’. As such, the quality of the results is
purely a function of how well the graph encoder represents
2.3. Graphs and LLMs the answer to the task and how well the LLM interprets that
output.
The confluence of graph representation learning and rea-
soning with LLMs is a rapidly growing field of research – 3.1. Architecture
like language, structured data surrounds us but, unlike LLM
input, it is not sequential. Some of the first graphs in the An overview of the architecture is provided in Figure 2. At
literature are knowledge graphs as in (Agarwal et al., 2020), a high level, our model only has two components. First,
where the retrieval corpus of a retrieval LLM is augmented the graph encoder takes a graph as input and outputs a
with text-encoded knowledge graphs. Ye et al. (2023) utilize fixed number of token embeddings. These tokens are then
instruction fine-tuned LLMs for node classification. Simi- prepended to the sequence of initial token embeddings in
larly, Chen et al. (2023b) leverage LLMs to enhance graph the prompt for an LLM, which is then decoded to produce
learning models by incorporating rich text attributes. Wang an answer as text.
et al. (2023b) showed that language models demonstrate pre-
Graph Encoder. GNN models range from simple averag-
liminary abilities for graph reasoning tasks. Later, Fatemi
ing methods to complex models with multi-headed attention.
et al. (2024) proposed GraphQA – a comprehensive bench-
Thus there are a wide variety of graph representations possi-
mark to systematically evaluate models for graph reasoning
ble. We suspect that some of these representations are more
tasks – finding that graph reasoning tasks are currently dif-
suited to be consumed by an LLM. Therefore, we conducted
ficult and that scaling laws do not seem to apply. Finally,
a thorough study that includes several well-known graph
while there is a growing body of work in pre-training, fine-
encoder choices in Section 4.2. Our graph encoder takes the
tuning, and prompt-tuning with GNNs by themselves (Fang
relational structure of the graph as input, using some form of
et al., 2023; Liu et al., 2023), the research, though concep-
graph positional encoding as node features (either learned,
tually similar, differs crucially from our work. GNN-based
Laplacian, or a combination thereof) - see Section 4.2.2 for
approaches lack the textual understanding capabilities that
details.) Next, we apply a GNN to obtain a representation
are central to the integration of LLMs with graph learning
of the graph, which we read out using one of a few different
and reasoning.
techniques techniques depending on the task.
• For graph-level tasks (e.g., cycle check) we do
3. GraphToken
global pooling for readout, taking the mean or sum of
When considering how to pass structured data to an LLM the representations over all of the nodes.
there are largely two families of options: (1) encoding it as • For node-level tasks (e.g., node degree) we sepa-
lexical tokens for LLM embedding as in (Fatemi et al., 2024) rately output the representation of each node. This can
or (2) encoding it directly to a continuous representation be optionally concatenated with a graph-level pooling.
via a neural network – skipping any LLM token embedding. • For edge-level tasks (e.g., edge existence), we
While representing a graph as a sequence of lexical tokens use a global representation or the two node-level repre-
has benefits in terms of interpretability, there is often no sentations concatenated.
clear choice in what order to sequentially write the struc-
We note that the exact option for readout used (e.g. mean
tured data. We believe a text encoding of structured data
or sum pooling) is a hyper-parameter chosen during model
prohibits rich, concise, and expressive representations. In-
selection. Whichever the readout technique, this representa-
stead, our method eschews representing a graph in text in
tion is then projected onto the space of tokens used by the
favor of directly producing – using a GNN as an encoder –
LLM with a final dense layer.
the continuous representations for the LLM input. We refer
to these new graph encoder learned soft-tokens in the LLM LLM. For the experiments in the paper we use PaLM 2(Anil

3
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

GraphToken Encoder
Graph
Graph Convolution Graph Convolution

Feature Encoding

GraphTokens
…
Readout
…

Frozen LLM

positional encoding
Embedding lookup

Question Tokens
Output
Question

…
Yes, there is a
Is there a cycle in this graph? Is there a cycle in this graph ?

+
cycle in this graph.

Figure 2. A visual overview of the architecture of GraphToken. The framework takes a graph and a corresponding question as input. The
graph encoder takes the graph and generates graph tokens. The question is tokenized and embedded to question tokens. A frozen LLM
leverages the graph and question tokens to generate an answer.

et al., 2023), however, our method generalizes to nearly any level, node-level, and edge-level tasks.
LLM in use today. For our purposes, any language model • R2: The performance of different graph convolution
which can accept a sequence of token embeddings and pro- architectures varies across tasks. This highlights the
duce text is acceptable, so long as it is possible to compute importance of carefully choosing the right architecture
a gradient with respect to part of the input sequence. for the specific graph reasoning problem at hand.
• R3: By intentionally breaking equivariance, we en-
3.2. Training procedure hance GraphToken’s graph reasoning capabilities.
Our training procedure is very similar to that used by soft
prompting methods (Lester et al., 2021). The training input Datasets. We conduct our experiments on the graph rea-
consists of triples (G, T, A) where G is a graph structure, soning tasks proposed in GraphQA (Fatemi et al., 2024).
T is a statement or question describing the task (e.g., ‘Does This dataset presents multiple graph reasoning problems
this graph contain a cycle?’ for cycle check) and A is with different difficulty levels. These tasks can be catego-
the ground truth answer (‘Yes, there exists a cycle in this rized as follows.
graph.’). • Graph-level. node count (counting the number of
In the forward pass, we compute the augmented query Q = nodes in a graph), edge count (counting the num-
E(G)||T (T ), concatenating the GraphToken encoding of ber of edges in a graph), cycle check (determining
the graph E(G) with the initial embedding of the task textual whether a graph contains a cycle), and triangle
representation, T (T ). counting (counting the number of triangles in a
graph).
We train by optimizing the final LLM perplexity (total log- • Node-level. node degree (calculating the degree
likelihood), L(A | Q), of the expected answer A with re- of a given node in a graph), and connected nodes
spect to the augmented query, Q. We minimize this loss, (finding all the nodes that are connected to a given node
back-propagating the gradient of L with respect to E(G) to in a graph),
the parameters of the GraphToken encoder – keeping all • Edge-level. reachability (finding if there is a
LLM parameters frozen. We use the Lion optimizer (Chen path from one node to another), edge existence
et al., 2023a) with a learning rate α = 0.05. (whether a given edge exists in a graph, and
shortest path (finding the length of the shortest
4. Experiments path from one node to another).

In this section, we summarize the key experiments con-

ducted with GraphToken. We begin by highlighting some Setting. We implemented GraphToken in Tensor-
of the most exciting results from our analysis here: flow (Abadi et al., 2015) using the TF-GNN library (Fer-
ludin et al., 2023). The LLM used in our experiments is the
• R1: GraphToken demonstrates superior performance instruction-fine-tuned Flan (Chung et al., 2022) checkpoint
compared to established baselines across a comprehen- of PaLM 2 S (Anil et al., 2023). Experiments were carried
sive range of graph reasoning tasks, including graph- out on Google TPUv3 and TPUv5e (Jouppi et al., 2017).

4
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Table 1. Results comparing GraphToken against prompt engineering and soft prompting on graph reasoning tasks using the GraphQATest
benchmark (Fatemi et al., 2024), by simple accuracy. We see that GraphToken substantially improves LLM performance on all graph,
node, and edge-level tasks. The first best result for each task is highlighted in bold and the second best result is highlighted with an
underline.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
ZERO - SHOT 0.217 0.124 0.760 0.015 0.140 0.147 0.849 0.445 0.115
ZERO - COT 0.146 0.094 0.323 0.127 0.104 0.088 0.735 0.335 0.336
FEW- SHOT 0.253 0.120 0.374 0.030 0.174 0.124 0.794 0.368 0.227
COT 0.276 0.128 0.580 0.081 0.292 0.131 0.452 0.428 0.386
COT- BAG 0.269 0.125 0.521 0.081 0.280 0.158 0.452 0.373 0.404
SOFT- PROMPT 0.056 0.018 0.832 0.162 0.098 0.068 0.838 0.544 0.462
GraphToken 0.996 0.426 0.956 0.348 0.962 0.264 0.932 0.738 0.638

Model selection was performed by evaluating performance Results. The results of this experiment, summarized in
on GraphQATrain Table 1, demonstrate that GraphToken significantly out-
performs existing methods on all graph, node, and edge-
4.1. Experiment 1: GraphToken Performance level tasks. While SOFT- PROMPT achieves the second best
score on some tasks, this is mainly due to its ability to pre-
In this experiment, we rigorously evaluate the performance dict majority labels. For example, 82% of the questions
of GraphToken against the following comprehensive set of in cycle check are about existent cycles. Similarly,
established baselines: 54% of the questions are about non-existent edges in edge
• ZERO - SHOT. In this approach, the model is given a existence.
task description and immediately asked to produce the
desired output. No additional examples or demonstra- 4.2. Experiment 2: Encoder Design
tions are provided.
From the results in Table 1, we can see that graph encoders
• FEW- SHOT. This approach provides the model with
can significantly improve a LLM’s capability on graph rea-
a few examples of the task and their desired out-
soning tasks. However the choice of graph encoders has a
puts (Brown et al., 2020). Unlike traditional training,
significant effect on task performance. Here we study how
these examples are included directly in the prompt,
different architecture choices affect the quality of the graph
allowing the model to learn and adapt during the infer-
representation for a language model’s use, including the
ence.
choices of the graph convolution, the features available to
• C OT. Chain-of-thought (CoT) prompting (Wei et al.,
the network, and the hyper-parameters.
2022) provides examples each showing step-by-step
reasoning, teaching the LLM to generate its own
4.2.1. C HOICE : G RAPH C ONVOLUTION
thought processes for tackling new tasks.
• ZERO - COT. Zero-shot CoT (Kojima et al., 2022) builds This experiment investigates the impact of graph convolu-
upon Chain-of-Thought (CoT) prompting by eliminat- tion choice on the performance of GraphToken. We evaluate
ing the need for training examples. The LLM generates the following diverse set of encoders:
its own step-by-step reasoning process using a simple
• Graph Convolutional Network (GCN): One of the
trigger phrase like “Let’s think step by step”.
earliest GNNs, this model does mean pooling of neigh-
• COT- BAG. BAG prompting (Wang et al., 2023b) ex-
bor features, followed by a non-linear transformation.
tends COT to improve the performance of LLMs on
(Kipf & Welling, 2017)
graph-related tasks by appending “Let’s construct a
• Message Passing Neural Network (MPNN): A gener-
graph with the nodes and edges first” to the prompt.
alization of the GCN, this model allows for more flexi-
• SOFT- PROMPT. This approach uses the standard soft
ble aggregation of neighbor features, and has additional
prompt tuning of Lester et al. (2021). It optimizes
nonlinear feature transformations possible. (Gilmer
a global static prompt which is shared across prob-
et al., 2017)
lem instances to improve task performance. Unlike
• Graph Isomorphism Network (GIN): A GNN de-
our proposed method, it does not have access to the
signed specifically to maximize the expressiveness of
graph information, making the results of this approach
the model, with respect to a classic graph isomorphim
equivalent to that of a majority classifier.
test. (Xu et al., 2018)
• Multi-Head Attention (Graph Transformer): This
GNN adapts transformer style attention, allowing it to

5
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Table 2. Study of individual graph encoder performance on GraphQATest tasks. Note that there is ‘no free lunch’ here – no single encoder
examined dominates across all tasks.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 0.746 0.056 0.964 0.208 0.264 0.264 0.918 0.68 0.604
Non-linear

GIN 0.704 0.052 0.898 0.194 0.252 0.18 0.902 0.65 0.586
MPNN 0.792 0.368 0.956 0.348 0.962 0.25 0.934 0.648 0.638
HGT 0.252 0.084 0.934 0.234 0.266 0.184 0.944 0.718 0.6
MHA 0.912 0.264 0.962 0.266 0.552 0.244 0.932 0.738 0.608
Linear

Node Set 0.996 0.080 0.948 0.198 0.19 0.118 0.942 0.596 0.568
Edge Set 0.618 0.426 0.964 0.228 0.22 0.096 0.904 0.592 0.568

learn different ways of passing messages (based on the question: do GNNs need to be equivariant in order to gener-
attention mask). (Dwivedi & Bresson, 2021) alize, especially with extremely powerful decoders, such as
• Heterogeneous Graph Transformer (HGT): Another LLMs?
adoption of transformer style attention (we note that it
We investigate this question by testing the graph reasoning
can be applied to non-heterogeneous graphs as well).
capability of GraphToken with three distinct node featuriza-
(Hu et al., 2020)
tion settings:
• Linear Aggregation In addition to the popular en-
coders from the literature, we also evaluated a family • LPE: Laplacian positional encodings using the normal-
of models which use linear aggregation of features, as ized Laplacian matrix, as in (Dwivedi et al., 2023).
this has been shown to work surprisingly well on a • IDX: unique identity encoding designed to break the
number of tasks (Bojchevski et al., 2020). GNN equivariance.
– Node Set: This model simply pools all the node • LPE+IDX: a concatenation of the above two strategies.
features in the graph together.
Setting. The experimental setup is similar to 4.2. Node
– Edge Set: This model simply pools all the edge
features of dimensionality d = 4 are evaluated for LPE and
features together (edge features are defined as the
IDX featurization. Models using LPE+IDX contains node
concatenation of its two nodes’ features).
features of size d = 8.
Setting: The experimental setup is similar to the experi-
Result. The outcome of this experiment are show in Fig-
ment in Section 4.1. Again, GraphQATrain performance was
ure 3, where we see the difference of all models from the
used for model selection, and we report the corresponding
S OFT P ROMPT baseline (Lester et al., 2021) when evaluted
model’s results on GraphQATest .
on GraphQATest . The core result is that learned positional
Result: Table 2 shows the results for each model on embeddings (Fig. 3b) generally outperform Laplacian posi-
GraphQATest . In general, we see that there is no one model tion embeddings (Fig 3a) for most encoders and most tasks.
that consistently dominates across graph encoding tasks. These results show that breaking equivariance surprisingly
Instead, we see that different graph encoder architectures adds additional capabilities for graph reasoning when pow-
have strengths and weaknesses advantages. erful decoders are present. Some additional observations
include:
There is one notable pattern however, is that the simple
linear GNN models perform quite strongly at their respective • Counting Tasks. Learned features seem to provide es-
counting tasks (i.e. NodeSet does well at node count, sential lift for basic counting tasks (node count,
EdgeSet does well at edge count). However models edge count, and node degree) in many en-
with non-linear effects are still capable on these tasks (e.g., coders.
MHA does well at node count, and MPNN does well on • Combination. In some very interesting cases of task
edge count). and encoder, the combination of both types of features
greatly improved model performance (Fig. 3c). For
4.2.2. C HOICE : GNN F EATURES example, GCN and NodeSet significantly improved at
the node count task.
Recently, positional node encodings (Wang et al., 2022;
• Linear models. NodeSet (an encoder which does not
Dwivedi et al., 2023; Lim et al., 2023) were proposed to
use the graph edges) generally benefited from spectral
enhance the expressivity of GNNs. On the other hand, in
features as they added previously unseen information
molecular modeling it has been shown recently that non-
about the graph structure.
equivariant encoders can match or exceed quality of equiv-
ariant ones (Wang et al., 2023c). This raises a more general

6
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

(a) Spectral Features (LPE) (b) Learned Features (IDX) (c) Learned and Spectral Features (LPE+IDX)

Figure 3. Effect of varying node features used in the graph encoder. Results shown are performance difference from the S OFT P ROMPT
baseline on GraphQATest . We see that breaking equivariance via learned features (Fig. 3b) generally improve the model performance, but
the combination of learned and spectral features (Fig. 3c) proves uniquely powerful for some encoders.

Table 3. Total number of parameters in the graph encoder.

Body Head
GCN 17,152 1.1 × 107
GIN 17,152 1.1 × 107
MPNN 83,968 1.1 × 107
HGT 198,788 1.1 × 107
MHA 101,376 1.1 × 107
Node Set 0 4.1 × 105
Edge Set 0 7.4 × 105
Figure 4. UMAP (McInnes et al., 2018) projection of GraphToken
embeddings produced by two different encoders, colored by the
diameter of a graph. We plot all 8-node graphs.

4.2.3. PARAMETER U SAGE IN G RAPH T OKEN

Setting: We consider the graph convolution evaluation from eters (Touvron et al., 2023). Meanwhile the closed source
Section 4.2.1, using LPE features with dimensionality d = 4. PaLM 1 model contains 540 billion parameters (Chowdhery
The graph encoders have a latent space of size 128. We then et al., 2022). In light of this, we can see that GraphToken is
project this into a prompt embedding with approximately highly parameter-efficient, and significantly improves the
80, 000 parameters in GraphToken . graph reasoning capability of a LLM while barely adding
Results: Table 3 shows the number of parameters used in any parameters at all.
the graph encoder. Here ‘body’ refers to the number of
parameters in the graph encoder itself, and ‘Head’ refers 5. Discussion
to the parameters in the transformation layer to the higher-
dimensional LLM token space. So far we have examined the performance benefits of Graph-
Token, and the design choices necessary when building a
Its also insightful to consider the number of parameters used graph encoder. However several questions remain: (1) What
in each of the models. Table 3 specifies total number of pa- exactly are the encoders doing, and (2) does it generalize?
rameters used by each GNN architecture. We note that this In this section we seek to provide some insight (if not an-
size is dominated by the total number of parameters in the swers) to these questions, and lay the foundations for future
projection layer to the token space (roughly 11 million). Out work.
of all non-linear architectures, attention-based ones (MHA
and HGT) add the most encoder-based parameters. In gen-
5.1. Graph Encoder Analysis
eral, the size of our graph encoder models varies from 17k to
199k parameters. This is significantly smaller than typical This section studies the properties learned by GraphToken’s
LLMs, which currently often contain tens or hundreds of bil- graph encoders by directly examining the representations
lions of parameters. For example, the open-source LLama2 they produce. We study both the in-distribution and out-
language model scales from 7 billion to 70 billion param- of-distribution properties of these encoders. We consider

7
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Table 4. Predicting bipartiteness using graph encoders trained for different tasks, measured on all graphs with 8 nodes. Observe that
graph encoders trained on cycle check and triangle counting generalize well to bipartiteness detection.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 53.82 53.28 55.46 50.00 50.00 54.64 50.00 48.48 51.60
Non-linear

GIN 51.09 53.27 52.74 51.91 53.26 53.57 51.36 52.17 52.18
MPNN 68.01 71.34 56.82 76.82 60.13 60.95 61.77 62.58 54.37
HGT 50.00 54.35 68.53 95.03 50.27 59.81 68.85 74.58 50.00
MHA 50.27 66.39 87.00 72.14 58.74 66.38 51.63 54.12 64.45
Linear

Node Set 56.55 57.38 56.30 55.74 56.29 56.28 55.73 57.93 56.56
Edge Set 50.82 50.82 50.82 50.55 50.54 50.54 50.82 50.82 50.54

9 tasks in total: total number of edges; maximum node 5.2. Future Work
degree; graph diameter; number of triangles; average local
This work opens up an exciting new avenue of exploration
clustering coefficient; largest core number; average shortest
for reasoning with structured data and LLMs. Some poten-
path length; testing planarity; testing bipartiteness.
tial avenues that we consider particularly exciting include:
One benefit of studying graphs is data availability: for small-
• This work just considers existing convolutions and
enough graphs, we can generate all possible graphs exhaus-
measures their effectiveness. An obvious and essential
tively using geng (McKay et al., 1981). The evaluation
next step is designing graph convolutions that best
goes as follows: First, we train an encoder on a task from
support LLMs in various graph reasoning tasks.
GraphQA (e.g. cycle check). Then, to evaluate the
• Evaluating the usefulness of this approach for factual
cross-task generalizability of the different encoders we train
grounding. Can we improve the ability of an LLM
a kNN classifier (or regressor) with k = 5 on the represen-
to answer questions about the data using prompting
tations of (i) an exhaustive set of connected graphs with 8
over knowledge graphs? Could an LLM answer novel
nodes (called graph8c in Balcilar et al. (2021)) and (ii) an
questions about a molecule given a GNN-produced
exhaustive set of tree graphs with 15 nodes. We note that be-
representation of it?
cause we are generating a large set of graphs (e.g. there are
• GraphToken improves performance with broken equiv-
11117 graphs of size 8) and only trained on GraphQATrain
ariance. Can this result inform other problems with
(1000 instances), the vast majority of the graphs we are us-
very strong decoder models?
ing here are unseen. As an illustration, a UMAP (McInnes
• This work examines how a GNN can be used to an
et al., 2018) visualization of the embeddings for all 8 node
enhance LLMs, but what about the reverse? Can we
graphs using two GNN encoders is presented in Figure 4.
use an LLM to interrogate a GNN to better explain its
Results. Since we present a lot of experiments and it’s hard results or provide higher quality answers?
to cover them all, we focus here on the task of predicting
whether a graph is bipartite and outsource the rest to the
6. Conclusions
Appendix. From the basic graph theory we know that, if
there is a triangle or an odd cycle in a graph, it can not be In this work we have studied the structured data encoding
bipartite. Therefore, we expect triangle counting problem for LLMs. Our novel method, GraphToken, learns
and cycle check training objectives to perform well on a graph embedding function through the gradients provided
this task. In Table 4 we can see that this is precisely what by a frozen LLM. GraphToken is especially well suited for
happens, with attention-based methods taking the lead. This projecting structured data into latent ‘prompt space’. It is a
is an interesting example of generalization from the graph parameter-efficient method as it requires only training the
encoders to a new task. graph encoder and does not update LLM parameters. Our
extensive experimental analysis across 9 graph reasoning
Overall, there is a significant performance gap between
tasks shows that GraphToken greatly improves graph rea-
different graph encoders, MPNN and attention-based ones
soning in LLMs – we observe up to a 73% improvement on
being generally the best. We observe significant correlations
the GraphQA benchmark.
in performance of in-distribution learning – for instance,
GraphToken trained on edge count performs the best on There is still much to do! We believe that our approach
edge count prediction. What is interesting is that node is fundamental for adapting new structured data sources to
count performs comparably here. This suggests that graph LLMs (which are expensive and time consuming to train),
encoders learn some universal features that are applicable and presents a very attractive way of improving fundamental
to many different downstream tasks. problems in LLMs, including hallucinations, factuality, and
freshness.

8
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Acknowledgements Bojchevski, A., Gasteiger, J., Perozzi, B., Kapoor, A., Blais,
M., Rózemberczki, B., Lukasik, M., and Günnemann, S.
We thank Oleksandr Ferludin, Johannes Gasteiger, Silvio Scaling graph neural networks with approximate pager-
Lattanzi, Vahab Mirrokni and Jan Pfeifer for discussions ank. In KDD, 2020. Cited on page 6.
about the work.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,
References Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., et al. Language models are few-shot learners.
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., NeurIPS, 2020. Cited on pages 1 and 5.
Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M.,
Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, Chami, I., Abu-El-Haija, S., Perozzi, B., Re, C., and Mur-
M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Lev- phy, K. Machine learning on graphs: A model and com-
enberg, J., Mané, D., Monga, R., Moore, S., Murray, D., prehensive taxonomy. JMLR, 2022. Cited on page 3.
Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever,
I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, Chen, X., Wang, X., Changpinyo, S., Piergiovanni, A.,
V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Padlewski, P., Salz, D., Goodman, S., Grycner, A.,
Wicke, M., Yu, Y., and Zheng, X. TensorFlow: Large- Mustafa, B., Beyer, L., et al. PaLI: A jointly-scaled mul-
scale machine learning on heterogeneous systems, 2015. tilingual language-image model. In ICLR, 2022. Cited
URL https://www.tensorflow.org/. Software on page 1.
available from tensorflow.org. Cited on page 4.
Chen, X., Liang, C., Huang, D., Real, E., Wang, K., Liu,
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Y., Pham, H., Dong, X., Luong, T., Hsieh, C.-J., et al.
Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Symbolic discovery of optimization algorithms. arXiv
Anadkat, S., et al. Gpt-4 technical report. arXiv preprint preprint arXiv:2302.06675, 2023a. Cited on page 4.
arXiv:2303.08774, 2023. Cited on page 2.
Chen, Z., Mao, H., Li, H., Jin, W., Wen, H., Wei, X., Wang,
Agarwal, O., Ge, H., Shakeri, S., and Al-Rfou, R. S., Yin, D., Fan, W., Liu, H., et al. Exploring the potential
Knowledge graph based synthetic corpus generation for of large language models (llms) in learning on graphs.
knowledge-enhanced language model pre-training. arXiv arXiv preprint 2307.03393, 2023b. Cited on page 3.
preprint arXiv:2010.12688, 2020. Cited on page 3.
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra,
Anil, R., Dai, A. M., Firat, O., Johnson, M., Lepikhin, G., Roberts, A., Barham, P., Chung, H. W., Sutton,
D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko,
Z., et al. Palm 2 technical report. arXiv preprint S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer,
arXiv:2305.10403, 2023. Cited on pages 3 and 4. N., Prabhakaran, V., Reif, E., Du, N., Hutchinson, B.,
Pope, R., Bradbury, J., Austin, J., Isard, M., Gur-Ari, G.,
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Yin, P., Duke, T., Levskaya, A., Ghemawat, S., Dev, S.,
and Schmid, C. Vivit: A video vision transformer. In Michalewski, H., Garcia, X., Misra, V., Robinson, K., Fe-
ICCV, 2021. Cited on page 1. dus, L., Zhou, D., Ippolito, D., Luan, D., Lim, H., Zoph,
B., Spiridonov, A., Sepassi, R., Dohan, D., Agrawal,
Balcilar, M., Héroux, P., Gauzere, B., Vasseur, P., Adam, S., S., Omernick, M., Dai, A. M., Pillai, T. S., Pellat, M.,
and Honeine, P. Breaking the limits of message passing Lewkowycz, A., Moreira, E., Child, R., Polozov, O., Lee,
graph neural networks. In ICML, 2021. Cited on page 8. K., Zhou, Z., Wang, X., Saeta, B., Diaz, M., Firat, O.,
Catasta, M., Wei, J., Meier-Hellstern, K., Eck, D., Dean,
Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez- J., Petrov, S., and Fiedel, N. Palm: Scaling language
Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, modeling with pathways, 2022. Cited on page 7.
A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C.,
Song, F., Ballard, A., Gilmer, J., Dahl, G., Vaswani, A., Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y.,
Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N., Fedus, W., Li, E., Wang, X., Dehghani, M., Brahma,
Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, S., et al. Scaling instruction-finetuned language models.
Y., and Pascanu, R. Relational inductive biases, deep arXiv preprint arXiv:2210.11416, 2022. Cited on page 4.
learning, and graph networks, 2018. Cited on page 13.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert:
Bengio, Y., Ducharme, R., and Vincent, P. A neural prob- Pre-training of deep bidirectional transformers for lan-
abilistic language model. NIPS, 2000. Cited on page guage understanding. arXiv preprint arXiv:1810.04805,
2. 2018. Cited on pages 1 and 2.

9
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Dwivedi, V. P. and Bresson, X. A generalization of trans- Hu, Z., Dong, Y., Wang, K., and Sun, Y. Heterogeneous
former networks to graphs, 2021. Cited on pages 6 graph transformer, 2020. Cited on page 6.
and 13.
Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal,
Dwivedi, V. P., Joshi, C. K., Luu, A. T., Laurent, T., Bengio, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers,
Y., and Bresson, X. Benchmarking graph neural networks. A., Boyle, R., Cantin, P.-l., Chao, C., Clark, C., Coriell, J.,
JMLR, 24(43):1–48, 2023. Cited on page 6. Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami,
T. V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C. R.,
Edalati, A., Tahaei, M., Kobyzev, I., Nia, V. P., Clark, Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey,
J. J., and Rezagholizadeh, M. Krona: Parameter ef- A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D.,
ficient tuning with kronecker adapter. arXiv preprint Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le,
arXiv:2212.10650, 2022. Cited on page 2. D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean,
Fang, T., Zhang, Y., Yang, Y., Wang, C., and Chen, L. G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R.,
Universal prompt tuning for graph neural networks, 2023. Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omer-
Cited on page 3. nick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M.,
Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snel-
Fatemi, B., Halcrow, J., and Perozzi, B. Talk like a graph: ham, M., Souter, J., Steinberg, D., Swing, A., Tan, M.,
Encoding graphs for large language models. In ICLR, Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan,
2024. Cited on pages 1, 2, 3, 4, and 5. V., Walter, R., Wang, W., Wilcox, E., and Yoon, D. H.
In-datacenter performance analysis of a tensor processing
Ferludin, O., Eigenwillig, A., Blais, M., Zelle, D., Pfeifer, unit. SIGARCH Comput. Archit. News, 2017. Cited on
J., Sanchez-Gonzalez, A., Li, W. L. S., Abu-El-Haija, S., page 4.
Battaglia, P., Bulut, N., Halcrow, J., de Almeida, F. M. G.,
Gonnet, P., Jiang, L., Kothari, P., Lattanzi, S., Linhares, Jurafsky, Dan; Martin, J. H. N-gram language models. In
A., Mayer, B., Mirrokni, V., Palowitch, J., Paradkar, M., Speech and Language Processing (3rd ed.). 2021. Cited
She, J., Tsitsulin, A., Villela, K., Wang, L., Wong, D., on page 2.
and Perozzi, B. TF-GNN: Graph neural networks in
tensorflow, 2023. Cited on pages 4 and 13. Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain,
D., Perez, E., Schiefer, N., Hatfield-Dodds, Z., DasSarma,
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and N., Tran-Johnson, E., et al. Language models (mostly)
Dahl, G. E. Neural message passing for quantum chem- know what they know. arXiv preprint arXiv:2207.05221,
istry, 2017. Cited on page 5. 2022. Cited on page 1.
Guo, J., Du, L., and Liu, H. Gpt4graph: Can large lan- Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L.,
guage models understand graph structured data? an em- and Lewis, M. Generalization through memorization:
pirical evaluation and benchmarking. arXiv preprint Nearest neighbor language models. arXiv preprint
arXiv:2305.15066, 2023. Cited on page 1. arXiv:1911.00172, 2019. Cited on page 1.
Guu, K., Lee, K., Tung, Z., Pasupat, P., and Chang, M.
Kipf, T. N. and Welling, M. Semi-supervised classification
Retrieval augmented language model pre-training. In
with graph convolutional networks, 2017. Cited on page
ICML, 2020. Cited on page 1.
5.
He, R., Liu, L., Ye, H., Tan, Q., Ding, B., Cheng, L., Low,
J.-W., Bing, L., and Si, L. On the effectiveness of adapter- Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa, Y.
based tuning for pretrained language model adaptation. Large language models are zero-shot reasoners. NeurIPS,
arXiv preprint arXiv:2106.03164, 2021. Cited on page 2. 35:22199–22213, 2022. Cited on page 5.

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., Lester, B., Al-Rfou, R., and Constant, N. The power of
De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and scale for parameter-efficient prompt tuning, 2021. Cited
Gelly, S. Parameter-efficient transfer learning for nlp. In on pages 2, 4, 5, and 6.
ICML, 2019. Cited on page 2.
Levine, Y., Dalmedigos, I., Ram, O., Zeldes, Y., Jan-
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, nai, D., Muhlgay, D., Osin, Y., Lieber, O., Lenz, B.,
S., Wang, L., and Chen, W. Lora: Low-rank adaptation of Shalev-Shwartz, S., Shashua, A., Leyton-Brown, K., and
large language models. arXiv preprint arXiv:2106.09685, Shoham, Y. Standing on the shoulders of giant frozen
2021. Cited on page 2. language models, 2022. Cited on page 2.

10
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Li, X. L. and Liang, P. Prefix-tuning: Optimizing continuous Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi,
prompts for generation. arXiv preprint arXiv:2101.00190, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P.,
2021. Cited on page 2. Bhosale, S., et al. Llama 2: Open foundation and fine-
tuned chat models. arXiv preprint arXiv:2307.09288,
Lim, D., Robinson, J., Zhao, L., Smidt, T., Sra, S., Maron, 2023. Cited on pages 1 and 7.
H., and Jegelka, S. Sign and basis invariant networks for
spectral graph representation learning. In ICLR, 2023. Tsitsulin, A., Mottin, D., Karras, P., Bronstein, A., and
Cited on page 6. Müller, E. Sgr: Self-supervised spectral graph represen-
tation learning. arXiv preprint arXiv:1811.06237, 2018.
Liu, Z., Yu, X., Fang, Y., and Zhang, X. Graphprompt: Uni-
Cited on page 3.
fying pre-training and downstream tasks for graph neural
networks. In Proceedings of the ACM Web Conference Valipour, M., Rezagholizadeh, M., Kobyzev, I., and Ghodsi,
2023, 2023. Cited on page 3. A. Dylora: Parameter efficient tuning of pre-trained
McInnes, L., Healy, J., and Melville, J. Umap: Uniform models using dynamic search-free low-rank adaptation.
manifold approximation and projection for dimension arXiv preprint arXiv:2210.07558, 2022. Cited on page 2.
reduction. arXiv preprint arXiv:1802.03426, 2018. Cited
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
on pages 7 and 8.
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention
McKay, B. D. et al. Practical graph isomorphism. 1981. is all you need. NeurIPS, 30, 2017. Cited on page 1.
Cited on page 8.
Vu, T., Iyyer, M., Wang, X., Constant, N., Wei, J., Wei,
Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient J., Tar, C., Sung, Y.-H., Zhou, D., Le, Q., and Luong, T.
estimation of word representations in vector space. arXiv Freshllms: Refreshing large language models with search
preprint arXiv:1301.3781, 2013. Cited on page 2. engine augmentation, 2023. Cited on page 1.

Perozzi, B., Al-Rfou, R., and Skiena, S. Deepwalk: online Wang, C., Liu, X., Yue, Y., Tang, X., Zhang, T., Jiayang,
learning of social representations. In KDD, 2014. Cited C., Yao, Y., Gao, W., Hu, X., Qi, Z., Wang, Y., Yang, L.,
on page 3. Wang, J., Xie, X., Zhang, Z., and Zhang, Y. Survey on
factuality in large language models: Knowledge, retrieval
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., and domain-specificity, 2023a. Cited on page 1.
et al. Improving language understanding by generative
pre-training. 2018. Cited on page 1. Wang, H., Yin, H., Zhang, M., and Li, P. Equivariant and
stable positional encoding for more powerful graph neural
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D.,
networks. In ICLR, 2022. Cited on page 6.
Sutskever, I., et al. Language models are unsupervised
multitask learners. OpenAI blog, 1(8):9, 2019. Cited on Wang, H., Feng, S., He, T., Tan, Z., Han, X., and Tsvetkov,
page 2. Y. Can language models solve graph problems in natural
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., language? In NeurIPS, 2023b. Cited on pages 1, 3, and 5.
Matena, M., Zhou, Y., Li, W., and Liu, P. J. Exploring
Wang, Y., Elhag, A. A., Jaitly, N., Susskind, J. M., and
the limits of transfer learning with a unified text-to-text
Bautista, M. A. Generating molecular conformer fields.
transformer. The Journal of Machine Learning Research,
arXiv preprint arXiv:2311.17932, 2023c. Cited on page
21(1):5485–5551, 2020. Cited on page 1.
6.
Rosenfeld, R. Two decades of statistical language modeling:
Where do we go from here? Proceedings of the IEEE, 88 Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F.,
(8):1270–1278, 2000. Cited on page 2. Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought
prompting elicits reasoning in large language models.
Stechly, K., Marquez, M., and Kambhampati, S. Gpt- NeurIPS, 2022. Cited on page 5.
4 doesn’t know it’s wrong: An analysis of iterative
prompting for reasoning problems. arXiv preprint Xie, Y., Xu, Z., Zhang, J., Wang, Z., and Ji, S. Self-
arXiv:2310.12397, 2023. Cited on page 1. supervised learning of graph neural networks: A unified
review. IEEE TPAMI, 2022. Cited on page 3.
Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu,
J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., et al. Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How
Gemini: a family of highly capable multimodal models. powerful are graph neural networks? arXiv preprint
arXiv preprint arXiv:2312.11805, 2023. Cited on page 2. arXiv:1810.00826, 2018. Cited on page 5.

11
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Xu, L., Xie, H., Qin, S.-Z. J., Tao, X., and Wang, F. L.
Parameter-efficient fine-tuning methods for pretrained
language models: A critical review and assessment. arXiv
preprint arXiv:2312.12148, 2023. Cited on page 2.
Ye, R., Zhang, C., Wang, R., Xu, S., and Zhang, Y.
Natural language is all a graph needs. arXiv preprint
arXiv:2308.07134, 2023. Cited on page 3.

Zaken, E. B., Ravfogel, S., and Goldberg, Y. Bitfit:

Simple parameter-efficient fine-tuning for transformer-
based masked language-models. arXiv preprint
arXiv:2106.10199, 2021. Cited on page 2.

Zhao, M., Lin, T., Mi, F., Jaggi, M., and Schütze, H. Mask-
ing as an efficient alternative to finetuning for pretrained
language models. arXiv preprint arXiv:2004.12406, 2020.
Cited on page 2.
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y.,
Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. A survey of
large language models. arXiv preprint arXiv:2303.18223,
2023. Cited on pages 1 and 2.

12
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

A. Appendix
A.1. Graph Encoders
Notation. We briefly describe the notation we will use. The graph G = (V, E) contains the set of V nodes and E edges.
While we will only discuss simple graphs, everything discussed can be extended to heterogeneous graphs w.l.o.g. (Battaglia
et al., 2018; Ferludin et al., 2023).
Using the notation of Ferludin et al. (2023), a GNN has two primary operations. First, a next state function (N EXT S TATE)
which computes the hidden state hv of a node (or edge, m(u,v) ) given information from its neighbors and its previous state,
and an aggregation function (E DGE P OOL) which pools information for a node’s immediate neighborhood into a fixed size
representation. More formally, we can say that the next state of a node is:
(i+1)
h(i+1)
v = N EXT S TATEV (h(i) (i+1)
v , mv ).

Then the pooled messages m(i+1)

v are defined as follows:
(i+1) (i+1) (i)
m(u,v) = N EXT S TATEE (h(i) (i)
u , hv , m(u,v) ),
(i+1)
m(i+1)
v = E DGE P OOL(i+1) (h(i)
v , {m(u,v) | u ∈ N (v)}).

Different realizations of the N EXT S TATE and E DGE P OOL functions can implement a wide variety of GNN operations. This
can include powerful models which use Transformer style attention instead of the provided graph edges (Dwivedi & Bresson,
2021).
The architecture of NodeSet and EdgeSet is shown in Figure 5. Other GNN models have graph convolutions before
node/edge states are read out.

MLP MLP
MLP
MLP MLP
MLP

pool pool poo

MLP
MLP
MLP

project project proje

pool
… … …
Graph Tokens Graph Tokens Graph To
project
(a) Node Set architecture (b) Edge Set architecture

…
Figure 5. Figurative illustrations of set-based GNN architectures employed in the paper. We pool representations from either nodes or
edges, transform them via anGraph
MLP withTokens weights, pool, and project to the GraphToken space.
shared

A.2. Additional experiments

We present additional results for graph encoder analysis. Tables 5–15 present additional results on more graph properties, as
well as experiments on tree-structured graphs of size 15. In general, complete graph populations demonstrate significantly
better performance than trees – we can attribute that to the fact that GraphToken was trained on diverse sets of data, and
trees are somewhat out-of-distribution. Nevertheless, for all considered cases the best overall encoder model achieved better
results than naı̈ve set encodings.

13
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Table 5. Average local clustering coefficient MSE measured on all connected graphs with 8 nodes. We highlight the best performance per
training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 1.62 1.67 2.12 4.49 4.49 1.73 4.49 16.57 3.75
Non-linear

GIN 2.18 2.29 2.45 2.60 2.44 2.31 3.73 2.88 3.37
MPNN 1.03 0.95 1.38 0.81 1.50 1.34 1.68 1.87 1.47
HGT 2.63 2.25 2.08 1.23 2.49 2.17 1.90 1.62 2.52
MHA 2.69 1.01 1.23 0.96 1.56 1.25 2.08 1.59 1.29
Linear

Node Set 2.59 2.56 2.59 2.59 2.58 2.60 2.58 2.58 2.56
Edge Set 2.22 2.22 2.22 2.22 2.24 2.23 2.22 2.22 2.23

Table 6. Degree accuracy on all connected graphs with 8 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 57.46 56.65 52.46 40.09 40.09 57.42 40.09 15.73 40.26
Non-linear

GIN 56.86 56.30 54.55 48.75 55.59 57.56 40.14 50.81 44.83
MPNN 69.45 69.60 67.19 71.84 64.56 67.62 61.37 58.66 63.18
HGT 55.20 55.70 56.54 60.17 56.62 57.65 58.02 59.06 55.46
MHA 54.86 64.33 62.86 65.63 61.67 63.22 56.98 61.60 63.97
Linear

Node Set 54.66 54.91 54.98 55.06 54.78 54.64 54.50 54.94 54.72
Edge Set 63.48 63.37 63.07 63.55 63.08 63.37 63.47 63.06 63.44

Table 7. Diameter Accuracy on all connected graphs with 8 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 66.86 67.81 66.70 37.37 37.37 68.91 37.37 52.13 55.13
Non-linear

GIN 66.06 64.87 63.97 61.09 64.98 66.43 37.80 60.65 54.82
MPNN 76.92 76.86 73.63 78.33 74.78 77.18 74.42 69.56 76.23
HGT 63.97 65.24 66.88 70.45 65.30 68.45 69.64 68.97 66.04
MHA 63.76 74.17 76.00 74.03 73.50 74.71 68.45 69.32 72.95
Linear

Node Set 67.28 67.24 67.01 66.97 66.81 67.19 67.09 66.87 66.79
Edge Set 66.99 66.51 66.63 66.83 66.65 67.02 66.60 66.93 66.90

Table 8. k-Core Accuracy on all connected graphs with 8 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 69.49 69.15 66.61 58.33 58.33 69.16 58.33 25.18 61.55
Non-linear

GIN 68.03 65.98 64.85 62.67 66.74 67.84 58.84 63.34 59.08
MPNN 87.42 87.54 81.81 88.63 80.30 83.48 80.08 71.01 82.05
HGT 63.92 65.29 67.00 70.01 65.44 67.32 68.35 70.08 65.13
MHA 64.30 80.80 73.49 80.81 76.98 78.83 69.43 74.21 75.92
Linear

Node Set 68.23 68.74 68.50 68.71 68.07 67.99 68.85 68.17 68.70
Edge Set 66.30 65.78 65.58 66.15 65.76 65.91 65.94 65.77 65.71

Table 9. #edges Accuracy on all connected graphs with 8 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 38.91 39.19 35.94 11.60 11.60 40.24 11.60 2.19 14.58
Non-linear

GIN 38.13 37.33 36.57 31.66 37.74 38.34 11.88 31.45 25.92
MPNN 86.58 86.72 53.15 84.56 52.12 66.01 50.70 41.96 59.95
HGT 35.63 37.45 38.23 40.39 37.14 37.80 39.68 39.74 36.86
MHA 35.85 55.32 45.04 53.52 47.89 49.44 39.69 42.84 46.17
Linear

Node Set 40.06 40.14 39.40 40.15 39.97 39.72 39.88 39.79 39.89
Edge Set 37.93 38.11 38.05 37.92 38.05 37.67 37.64 37.82 37.91

14
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Table 10. Planarity AUC on all connected graphs with 8 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 74.18 73.76 72.61 50.00 50.00 74.74 50.00 50.00 49.44
Non-linear

GIN 77.35 73.00 72.06 69.37 74.86 75.85 50.73 68.97 61.58
MPNN 86.14 86.52 84.16 86.64 83.74 85.17 84.32 77.84 85.55
HGT 69.24 71.41 71.02 74.07 71.47 72.20 72.20 73.59 71.55
MHA 69.96 80.87 78.35 80.46 81.53 81.21 74.98 78.29 80.58
Linear

Node Set 78.41 78.76 78.86 78.82 78.18 78.54 78.72 78.76 78.78
Edge Set 72.17 71.64 72.06 72.20 71.93 72.11 72.01 72.27 72.01

Table 11. Shortest path MSE on all connected graphs with 8 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 2.27 2.24 2.31 6.07 6.07 2.06 6.07 11.09 3.75
Non-linear

GIN 2.57 2.77 2.83 2.93 2.52 2.54 4.84 3.09 3.61
MPNN 0.29 0.29 0.76 0.31 0.71 0.49 0.75 1.58 0.51
HGT 3.03 2.64 2.27 1.60 2.60 2.14 1.80 1.95 2.81
MHA 3.04 0.71 0.95 0.78 1.01 0.74 1.74 1.55 1.05
Linear

Node Set 2.35 2.35 2.35 2.36 2.36 2.35 2.34 2.36 2.34
Edge Set 2.99 2.99 2.99 2.99 2.97 2.97 2.99 2.99 2.99

Table 12. # of triangles MSE on all connected graphs with 8 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 132.94 129.03 164.53 316.07 316.07 127.17 316.07 690.03 293.53
Non-linear

GIN 152.13 168.35 182.95 201.64 169.71 156.16 251.23 200.45 251.65
MPNN 8.33 7.51 32.08 4.56 51.90 27.18 51.04 124.89 41.73
HGT 191.14 170.71 165.88 126.92 172.84 160.29 156.10 136.22 175.45
MHA 197.36 30.27 96.56 27.10 59.58 52.42 138.48 80.22 60.72
Linear

Node Set 167.81 168.72 167.33 167.40 167.90 167.96 168.57 169.38 166.13
Edge Set 181.44 181.21 181.18 181.32 180.86 179.44 181.08 181.68 181.40

Table 13. Degree Accuracy on all trees with 15 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 53.57 55.15 55.24 25.91 25.91 54.86 25.91 11.08 36.51
Non-linear

GIN 60.35 58.79 56.36 55.11 59.88 68.04 42.01 66.72 55.25
MPNN 79.37 78.36 59.18 72.35 62.38 65.90 57.37 57.33 58.45
HGT 54.88 55.33 55.34 58.65 54.33 58.84 57.27 57.43 55.34
MHA 59.17 61.61 60.38 57.18 54.99 61.00 52.29 58.56 53.95
Linear

Node Set 65.64 66.32 65.93 66.10 66.13 65.95 66.28 66.22 65.82
Edge Set 69.59 69.87 69.44 69.40 69.86 69.56 69.32 69.55 69.66

Table 14. Diameter Accuracy on all trees with 15 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 50.77 50.36 49.54 25.97 25.97 50.01 25.97 6.77 26.64
Non-linear

GIN 58.29 54.44 52.24 49.41 51.47 59.62 24.11 58.77 46.27
MPNN 54.24 54.68 54.97 59.29 67.65 63.80 54.13 52.05 59.48
HGT 57.15 54.88 54.90 57.58 57.05 65.22 54.51 58.70 53.07
MHA 53.95 56.63 60.41 54.62 53.39 56.07 52.85 55.17 51.70
Linear

Node Set 61.89 62.68 62.74 62.36 61.99 61.93 62.34 62.49 62.40
Edge Set 56.57 56.19 56.27 56.83 56.25 56.53 56.31 56.72 56.84

15
Let Your Graph Do the Talking: Encoding Structured Data for LLMs

Table 15. Shortest path MSE on all trees with 15 nodes. We highlight the best performance per training task in columns.
Graph Tasks Node Tasks Edge Tasks
Method Node count Edge count Cycle check Triangle counting Node degree Connected nodes Reachability Edge existence Shortest path
GCN 12.95 12.31 12.62 26.17 26.17 12.22 26.17 49.78 21.71
Non-linear

GIN 9.57 10.69 11.32 11.88 11.03 8.37 19.35 9.76 14.39
MPNN 4.19 4.54 9.82 4.92 6.87 6.10 11.06 12.10 11.01
HGT 10.57 10.96 11.65 9.09 12.56 8.17 10.76 9.26 10.98
MHA 10.49 9.88 9.51 11.22 12.75 10.52 13.31 10.09 12.78
Linear

Node Set 10.20 10.05 10.13 10.11 10.17 10.21 10.07 10.18 10.03
Edge Set 9.92 9.87 9.92 9.93 9.88 9.88 10.01 9.91 9.87

AI Linkedin exam answers
100% (1)
AI Linkedin exam answers
6 pages
2310.04560v1
No ratings yet
2310.04560v1
23 pages
Glam: Fine-Tuning Large Language Models For Domain Knowledge Graph Alignment Via Neighborhood Partitioning and Generative Subgraph Encoding
No ratings yet
Glam: Fine-Tuning Large Language Models For Domain Knowledge Graph Alignment Via Neighborhood Partitioning and Generative Subgraph Encoding
8 pages
Ye Et Al. 2024
No ratings yet
Ye Et Al. 2024
19 pages
Combining Pre-Trained Language Models and Structured Knowledge
No ratings yet
Combining Pre-Trained Language Models and Structured Knowledge
19 pages
Advancement_in_graph_understandig
No ratings yet
Advancement_in_graph_understandig
16 pages
Natural Language Is All A Graph Needs
No ratings yet
Natural Language Is All A Graph Needs
21 pages
2402.07630v3
No ratings yet
2402.07630v3
23 pages
A Survey of Graph Prompting Methods
No ratings yet
A Survey of Graph Prompting Methods
11 pages
2311.12399
No ratings yet
2311.12399
13 pages
A Survey of Graph Meets Large Language Model: Progress and Future Directions
No ratings yet
A Survey of Graph Meets Large Language Model: Progress and Future Directions
13 pages
GPT4Graph
No ratings yet
GPT4Graph
11 pages
A Survey of Graph Meets Large Language Model - Progress and Future Directions
No ratings yet
A Survey of Graph Meets Large Language Model - Progress and Future Directions
13 pages
TGDK 1 1 2
No ratings yet
TGDK 1 1 2
38 pages
LLM As GNN Graph Vocabul
No ratings yet
LLM As GNN Graph Vocabul
25 pages
1719309701044
No ratings yet
1719309701044
27 pages
Large Language Models On Graphs: A Comprehensive Survey
No ratings yet
Large Language Models On Graphs: A Comprehensive Survey
26 pages
2.Graph Language Models
No ratings yet
2.Graph Language Models
18 pages
Wavelets Meet Large Language Models
No ratings yet
Wavelets Meet Large Language Models
16 pages
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning On Graphs
No ratings yet
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning On Graphs
21 pages
2312.02783v3
No ratings yet
2312.02783v3
25 pages
LLM Paper 1707247828
No ratings yet
LLM Paper 1707247828
24 pages
kache asar golpo
No ratings yet
kache asar golpo
14 pages
1 Xyz
No ratings yet
1 Xyz
21 pages
57 Exploring The Potential of Lar
No ratings yet
57 Exploring The Potential of Lar
31 pages
Ali 等 - 2024 - Prompt-SAW Leveraging Relation-Aware Graphs for Textual Prompt Compression
No ratings yet
Ali 等 - 2024 - Prompt-SAW Leveraging Relation-Aware Graphs for Textual Prompt Compression
16 pages
A Survey of Large Language Models on Generative Graph Analytics-dual-translated
No ratings yet
A Survey of Large Language Models on Generative Graph Analytics-dual-translated
62 pages
Augmenting LLMs With Knowledge - A Survey On Hallucination Prevention
No ratings yet
Augmenting LLMs With Knowledge - A Survey On Hallucination Prevention
11 pages
Evaluating Generative Models For Graph-to-Text Generation
No ratings yet
Evaluating Generative Models For Graph-to-Text Generation
9 pages
参考文献
No ratings yet
参考文献
22 pages
GIVE
No ratings yet
GIVE
23 pages
Bert
No ratings yet
Bert
20 pages
report24
No ratings yet
report24
7 pages
LLMLingua Compressing Prompts LLM Jiangetal
No ratings yet
LLMLingua Compressing Prompts LLM Jiangetal
19 pages
Bert
No ratings yet
Bert
10 pages
2021 Naacl-Main 185
No ratings yet
2021 Naacl-Main 185
14 pages
O A: T T O G M A C T: Ne For LL Owards Raining NE Raph Odel For LL Lassification Asks
No ratings yet
O A: T T O G M A C T: Ne For LL Owards Raining NE Raph Odel For LL Lassification Asks
22 pages
3589335.3641300
No ratings yet
3589335.3641300
4 pages
2021.sustainlp-1.0
No ratings yet
2021.sustainlp-1.0
10 pages
2412.04185v1
No ratings yet
2412.04185v1
20 pages
LLAMA AI Paper
No ratings yet
LLAMA AI Paper
18 pages
Koncel-Kedziorski et al. (2019) Text Generation from Knowledge Graphs with Graph Transformers Proceedings of NAACL-HLT 2019, pages 2284–2293
No ratings yet
Koncel-Kedziorski et al. (2019) Text Generation from Knowledge Graphs with Graph Transformers Proceedings of NAACL-HLT 2019, pages 2284–2293
10 pages
2024 Findings-Eacl 141
No ratings yet
2024 Findings-Eacl 141
17 pages
A Generalization of Transformer Networks To Graphs
No ratings yet
A Generalization of Transformer Networks To Graphs
8 pages
Perspective Large Languagemodels in Applied Mechanics
No ratings yet
Perspective Large Languagemodels in Applied Mechanics
7 pages
2504.10903v1
No ratings yet
2504.10903v1
30 pages
E NHANCING E DUCATIONAL QA S YSTEMS I NTEGRATING K NOWLEDGE G RAPHS A ND L ARGE L ANGUAGE M ODELS F OR C ONTEXT A WARE L EARNING
No ratings yet
E NHANCING E DUCATIONAL QA S YSTEMS I NTEGRATING K NOWLEDGE G RAPHS A ND L ARGE L ANGUAGE M ODELS F OR C ONTEXT A WARE L EARNING
9 pages
ENHANCING EDUCATIONAL QA SYSTEMS: INTEGRATING KNOWLEDGE GRAPHS AND LARGE LANGUAGE MODELS FOR CONTEXT-AWARE LEARNING
No ratings yet
ENHANCING EDUCATIONAL QA SYSTEMS: INTEGRATING KNOWLEDGE GRAPHS AND LARGE LANGUAGE MODELS FOR CONTEXT-AWARE LEARNING
9 pages
E NHANCING E DUCATIONAL QA S YSTEMS I NTEGRATING K NOWLEDGE G RAPHS A ND L ARGE L ANGUAGE M ODELS F OR C ONTEXT A WARE L EARNING
No ratings yet
E NHANCING E DUCATIONAL QA S YSTEMS I NTEGRATING K NOWLEDGE G RAPHS A ND L ARGE L ANGUAGE M ODELS F OR C ONTEXT A WARE L EARNING
9 pages
GPT3 Similar Performance2009.07118
No ratings yet
GPT3 Similar Performance2009.07118
11 pages
2024.findings-acl.837
No ratings yet
2024.findings-acl.837
14 pages
2501.11968v1
No ratings yet
2501.11968v1
22 pages
Graph Neural Networks For Natural Language Processing: A Survey
No ratings yet
Graph Neural Networks For Natural Language Processing: A Survey
127 pages
Beyond Graphs - Can Large Language Models Comprehend Hypergraphs?
No ratings yet
Beyond Graphs - Can Large Language Models Comprehend Hypergraphs?
26 pages
(ICLR 2024) One For All - Towards Training One Graph Model For All Classification Tasks
No ratings yet
(ICLR 2024) One For All - Towards Training One Graph Model For All Classification Tasks
23 pages
FutureOfLearning_LLMs_Book_Chapter
No ratings yet
FutureOfLearning_LLMs_Book_Chapter
12 pages
O A: T T O G M A C T: Ne For LL Owards Raining NE Raph Odel For LL Lassification Asks
No ratings yet
O A: T T O G M A C T: Ne For LL Owards Raining NE Raph Odel For LL Lassification Asks
23 pages
AdaLoGN - Adaptive Logic Graph Network For Reasoning-Based Machine Reading Comprehension 2203.08992
No ratings yet
AdaLoGN - Adaptive Logic Graph Network For Reasoning-Based Machine Reading Comprehension 2203.08992
11 pages
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
No ratings yet
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
12 pages
KnowPath_ Reasoning via LLM-generated Inference Paths
No ratings yet
KnowPath_ Reasoning via LLM-generated Inference Paths
9 pages
Google JAX Essentials: A quick practical learning of blazing-fast library for machine learning and deep learning projects
From Everand
Google JAX Essentials: A quick practical learning of blazing-fast library for machine learning and deep learning projects
Mei Wong
No ratings yet
Recurrent Residual U-Net Short Critical Review
No ratings yet
Recurrent Residual U-Net Short Critical Review
3 pages
System Intelligence
No ratings yet
System Intelligence
3 pages
Deep Learning in Medical Image Analysis
No ratings yet
Deep Learning in Medical Image Analysis
28 pages
AI - The Future Technology
No ratings yet
AI - The Future Technology
14 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
5 pages
RNN_LSTM_BiRNN_Notes
No ratings yet
RNN_LSTM_BiRNN_Notes
3 pages
GPTs 101 - How To Create Your ChatGPT Model (2024)
0% (1)
GPTs 101 - How To Create Your ChatGPT Model (2024)
15 pages
Webinar Brochure
No ratings yet
Webinar Brochure
1 page
Next-Gpt: Any-To-Any Multimodal LLM: Project
No ratings yet
Next-Gpt: Any-To-Any Multimodal LLM: Project
22 pages
Project Final Report 1
No ratings yet
Project Final Report 1
5 pages
How To Code A Neural Network With Backpropagation in Python
No ratings yet
How To Code A Neural Network With Backpropagation in Python
133 pages
Learning Profiles in Duplicate Question Detection
No ratings yet
Learning Profiles in Duplicate Question Detection
7 pages
Hands On Machine Learning with Scikit Learn and TensorFlow Early Release 2nd Edition Aurélien Géron download pdf
100% (3)
Hands On Machine Learning with Scikit Learn and TensorFlow Early Release 2nd Edition Aurélien Géron download pdf
65 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
214 Fractalnet Ultra Deep Neural N
No ratings yet
214 Fractalnet Ultra Deep Neural N
11 pages
Object-Detection-with-YOLO
No ratings yet
Object-Detection-with-YOLO
18 pages
Esom Ijcnn00
No ratings yet
Esom Ijcnn00
7 pages
Variation Autoencoder VAEs in PyTorch
No ratings yet
Variation Autoencoder VAEs in PyTorch
9 pages
A Review On Sentiment Analysis Using Machine Learning
No ratings yet
A Review On Sentiment Analysis Using Machine Learning
5 pages
Neural Network and Deep Learning
No ratings yet
Neural Network and Deep Learning
14 pages
Hands On Text Analytics With Orange - Digital Humanities 2017
No ratings yet
Hands On Text Analytics With Orange - Digital Humanities 2017
2 pages
FaceNet Key POints
No ratings yet
FaceNet Key POints
19 pages
Analysis of Mood Based On Song Data Using Clustering and Supervised Learning Techniques
No ratings yet
Analysis of Mood Based On Song Data Using Clustering and Supervised Learning Techniques
3 pages
Wikipedia Machine Learning
No ratings yet
Wikipedia Machine Learning
6 pages
Machine Learning and Deep Learning Approach For Medical Image Analysis: Diagnosis To Detection
No ratings yet
Machine Learning and Deep Learning Approach For Medical Image Analysis: Diagnosis To Detection
39 pages
2.vanishing Gradient and Exploding Gradient Simple Notes
No ratings yet
2.vanishing Gradient and Exploding Gradient Simple Notes
2 pages
Others Indigo Case Study PPT
No ratings yet
Others Indigo Case Study PPT
9 pages
Natural Language Processing Professional Program
No ratings yet
Natural Language Processing Professional Program
12 pages
AI and ML Techniques For Cyber Security
No ratings yet
AI and ML Techniques For Cyber Security
8 pages