0% found this document useful (0 votes)
7 views

latex_conversion

This document is a comprehensive literature review on Retrieval-Augmented Generation (RAG) systems, focusing on the optimization of retrieval mechanisms and efficient knowledge access. It covers historical perspectives of pre-RAG systems, foundational concepts of RAG, various retrieval algorithms, vector indexing methods, and empirical evaluation methodologies. The review also identifies research gaps and future directions for advancements in RAG systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

latex_conversion

This document is a comprehensive literature review on Retrieval-Augmented Generation (RAG) systems, focusing on the optimization of retrieval mechanisms and efficient knowledge access. It covers historical perspectives of pre-RAG systems, foundational concepts of RAG, various retrieval algorithms, vector indexing methods, and empirical evaluation methodologies. The review also identifies research gaps and future directions for advancements in RAG systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Algorithmic Optimization of Retrieval

Mechanisms in RAG Systems:


A Data Structure-Driven Approach to
Efficient Knowledge Access

Enhanced Literature Review

April 15, 2025


Contents

1 Introduction 5

2 Historical Perspective: Pre-RAG Retrieval Systems 5

2.1 Classical Information Retrieval (1960s-1990s) . . . . . . . . . . . . . . . . 5

2.1.1 Boolean Retrieval Models . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.3 Probabilistic Retrieval Models . . . . . . . . . . . . . . . . . . . . 6

2.2 Early Neural Information Retrieval (2000s-2015) . . . . . . . . . . . . . . 6

2.2.1 Latent Semantic Analysis and Topic Models . . . . . . . . . . . . 6

2.2.2 Learning to Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.3 Early Neural Network Approaches . . . . . . . . . . . . . . . . . . 7

2.3 Pre-RAG Neural Retrieval and Question Answering (2015-2019) . . . . . 8

2.3.1 Word Embeddings for Retrieval . . . . . . . . . . . . . . . . . . . 8

2.3.2 Neural Ranking Models . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.3 Open-Domain Question Answering . . . . . . . . . . . . . . . . . 8

2.3.4 BERT and Contextual Embeddings . . . . . . . . . . . . . . . . . 9

2.4 The Emergence of Dense Retrieval (2019-2020) . . . . . . . . . . . . . . . 9

2.4.1 Dual Encoder Architectures . . . . . . . . . . . . . . . . . . . . . 9

2.4.2 Contrastive Learning for Retrieval . . . . . . . . . . . . . . . . . . 10

2.4.3 Pre-RAG Dense Retrieval . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Transition to RAG (2020) . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Foundations of RAG Systems 11

1
3.1 Emergence of RAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Dense Passage Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Evolution of RAG Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Retrieval Algorithms in RAG Systems 13

4.1 Fundamental Retrieval Paradigms . . . . . . . . . . . . . . . . . . . . . . 13

4.1.1 Dense Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.2 Sparse Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.3 Hybrid Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2 Advanced Retrieval Techniques . . . . . . . . . . . . . . . . . . . . . . . 14

4.2.1 Multi-stage Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2.2 Query Reformulation and Expansion . . . . . . . . . . . . . . . . 14

4.2.3 Contrastive Learning for Retrieval . . . . . . . . . . . . . . . . . . 15

4.3 Context-Aware and Efficient Retrieval . . . . . . . . . . . . . . . . . . . . 15

5 Vector Indexing Methods for Efficient Retrieval 16

5.1 Approximate Nearest Neighbor Search . . . . . . . . . . . . . . . . . . . 16

5.2 Graph-Based Indexing Methods . . . . . . . . . . . . . . . . . . . . . . . 16

5.2.1 Hierarchical Navigable Small World (HNSW) . . . . . . . . . . . 16

5.2.2 Other Graph-Based Methods . . . . . . . . . . . . . . . . . . . . 17

5.3 Quantization-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.3.1 Product Quantization . . . . . . . . . . . . . . . . . . . . . . . . 17

5.3.2 Optimized Product Quantization . . . . . . . . . . . . . . . . . . 17

5.4 Inverted Index-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . 18

5.4.1 Inverted File Index (IVF) . . . . . . . . . . . . . . . . . . . . . . 18

2
5.4.2 IVF-PQ: Combining Inverted Indices and Product Quantization . 18

5.5 Hybrid and Multi-Index Approaches . . . . . . . . . . . . . . . . . . . . 18

6 Empirical Evaluation Methodologies in RAG Research 19

6.1 Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6.1.1 Question Answering Datasets . . . . . . . . . . . . . . . . . . . . 19

6.1.2 Knowledge-Intensive Tasks . . . . . . . . . . . . . . . . . . . . . . 20

6.1.3 Fact Verification and Hallucination Assessment . . . . . . . . . . 20

6.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.2.1 Retrieval-Specific Metrics . . . . . . . . . . . . . . . . . . . . . . 20

6.2.2 Generation-Specific Metrics . . . . . . . . . . . . . . . . . . . . . 21

6.2.3 End-to-End Evaluation Metrics . . . . . . . . . . . . . . . . . . . 21

6.3 Experimental Design Approaches . . . . . . . . . . . . . . . . . . . . . . 22

6.3.1 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.3.2 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . 22

6.3.3 Cross-Domain Evaluation . . . . . . . . . . . . . . . . . . . . . . 23

6.3.4 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.4 Evaluation Challenges and Limitations . . . . . . . . . . . . . . . . . . . 23

6.4.1 Reference Limitations . . . . . . . . . . . . . . . . . . . . . . . . 23

6.4.2 Retrieval Evaluation Complexity . . . . . . . . . . . . . . . . . . 24

6.4.3 Trade-off Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.5 Recent Advances in Evaluation Methodologies . . . . . . . . . . . . . . . 24

6.5.1 Automated Factuality Assessment . . . . . . . . . . . . . . . . . . 24

6.5.2 Retrieval-Aware Evaluation . . . . . . . . . . . . . . . . . . . . . 25

3
6.5.3 Efficiency-Focused Evaluation . . . . . . . . . . . . . . . . . . . . 25

7 Research Gaps and Future Directions 25

7.1 Dynamic Knowledge Management . . . . . . . . . . . . . . . . . . . . . . 25

7.2 Domain-Specific Optimization . . . . . . . . . . . . . . . . . . . . . . . . 26

7.3 Theoretical Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.5 Hardware Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7.6 Integration of Symbolic and Neural Approaches . . . . . . . . . . . . . . 28

7.7 Multi-Modal RAG Systems . . . . . . . . . . . . . . . . . . . . . . . . . 29

7.8 Privacy-Preserving RAG . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

8 Conclusion 30

9 Visual Elements and Comparisons 32

9.1 Tables and Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

9.1.1 Comparison of Foundational RAG Papers . . . . . . . . . . . . . 32

9.1.2 Comparison of Vector Indexing Methods for RAG Systems . . . . 33

9.1.3 Comparison of Retrieval Paradigms in RAG Systems . . . . . . . 33

9.2 Figures and Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

9.2.1 RAG System Architecture . . . . . . . . . . . . . . . . . . . . . . 33

9.2.2 HNSW Indexing Structure . . . . . . . . . . . . . . . . . . . . . . 33

10 References 36

4
1 Introduction

Retrieval-Augmented Generation (RAG) has emerged as a critical paradigm in the field of


Large Language Models (LLMs), addressing several key limitations of traditional LLMs
such as hallucination, outdated knowledge, and non-transparent reasoning processes.
This literature review examines the academic research on RAG systems, with a par-
ticular focus on algorithmic optimization of retrieval mechanisms and the data structures
that enable efficient knowledge access.

The review is organized into several main sections: historical perspective on pre-
RAG systems, foundations of RAG systems, retrieval algorithms in RAG, vector indexing
methods for efficient retrieval, empirical evaluation methodologies, and research gaps
and future directions. Throughout, we emphasize peer-reviewed academic research that
provides theoretical foundations, empirical evaluations, and state-of-the-art approaches
in each area.

2 Historical Perspective: Pre-RAG Retrieval Systems

To fully appreciate the innovation that Retrieval-Augmented Generation (RAG) repre-


sents, it is essential to understand the historical evolution of information retrieval systems
that preceded it. This section provides a chronological overview of key developments in
information retrieval and question answering systems that laid the foundation for modern
RAG approaches.

2.1 Classical Information Retrieval (1960s-1990s)

2.1.1 Boolean Retrieval Models

The earliest formal information retrieval systems were based on Boolean logic. As de-
scribed by Salton et al. [1983] in their seminal work “Introduction to Modern Information
Retrieval”:

“Boolean retrieval models represent documents and queries as sets of terms, with
retrieval based on exact matching using logical operators (AND, OR, NOT).”

While computationally efficient, these models suffered from significant limitations,


including binary relevance judgments and inability to rank results by relevance.

5
2.1.2 Vector Space Model

Salton et al. [1975] introduced the Vector Space Model (VSM) in “A Vector Space Model
for Automatic Indexing,” which represented a fundamental shift in information retrieval:

“The vector space model represents documents and queries as vectors in a high-
dimensional space, with each dimension corresponding to a term in the vocabulary.
Similarity between documents and queries is computed using measures such as cosine
similarity.”

This approach enabled ranked retrieval based on degrees of similarity rather than
binary matching, significantly improving retrieval effectiveness.

2.1.3 Probabilistic Retrieval Models

Robertson and Spärck Jones [1976] developed the probabilistic relevance framework in
“Relevance Weighting of Search Terms,” which provided a theoretical foundation for rank-
ing documents by their probability of relevance to a query:

“The probabilistic approach to retrieval is based on estimating the probability that a


document will be judged relevant to a particular query.”

This work led to the development of the BM25 ranking function [Robertson et al.,
1995], which remains influential in modern retrieval systems and serves as a strong base-
line in RAG research.

2.2 Early Neural Information Retrieval (2000s-2015)

2.2.1 Latent Semantic Analysis and Topic Models

Deerwester et al. [1990] introduced Latent Semantic Analysis (LSA) in “Indexing by


Latent Semantic Analysis,” which used singular value decomposition to identify latent
semantic structures in document-term matrices:

“LSA addresses the problems of synonymy and polysemy by mapping terms and doc-
uments to a lower-dimensional ’semantic space’.”

Building on this concept, Blei et al. [2003] developed Latent Dirichlet Allocation
(LDA) in “Latent Dirichlet Allocation,” a generative probabilistic model for collections

6
of discrete data such as text corpora:

“LDA is a three-level hierarchical Bayesian model, in which each item of a collection


is modeled as a finite mixture over an underlying set of topics.”

These approaches represented early attempts to capture semantic relationships beyond


lexical matching, a key goal that would later be more effectively addressed by neural
methods.

2.2.2 Learning to Rank

The learning to rank paradigm, formalized by Liu [2009] in “Learning to Rank for Infor-
mation Retrieval,” applied machine learning techniques to optimize ranking functions:

“Learning to rank aims to automatically learn a ranking model from training data, such
that the model can sort new objects according to their degrees of relevance, preference,
or importance.”

This approach marked a shift toward data-driven optimization of retrieval systems


and introduced the use of multiple features for ranking, beyond simple term statistics.

2.2.3 Early Neural Network Approaches

Huang et al. [2013] presented one of the first successful applications of deep learning
to information retrieval in “Learning Deep Structured Semantic Models for Web Search
using Clickthrough Data”:

“We develop a series of new latent semantic models with a deep structure that project
queries and documents into a common low-dimensional space where the relevance of a
document given a query is readily computed as the distance between them.”

This work demonstrated the potential of neural networks to learn effective representa-
tions for information retrieval, setting the stage for later developments in neural retrieval
models.

7
2.3 Pre-RAG Neural Retrieval and Question Answering (2015-
2019)

2.3.1 Word Embeddings for Retrieval

Mikolov et al. [2013] introduced word2vec in “Efficient Estimation of Word Represen-


tations in Vector Space,” which enabled the representation of words as dense vectors
capturing semantic relationships:

“We propose two novel model architectures for computing continuous vector represen-
tations of words from very large data sets.”

These word embeddings were quickly applied to information retrieval tasks, with Vulić
and Moens [2015] demonstrating their effectiveness in “Monolingual and Cross-Lingual
Information Retrieval Models Based on (Bilingual) Word Embeddings.”

2.3.2 Neural Ranking Models

Guo et al. [2016] introduced DRMM (Deep Relevance Matching Model) in “A Deep Rel-
evance Matching Model for Ad-hoc Retrieval,” which specifically addressed the unique
characteristics of relevance matching in information retrieval:

“Unlike semantic matching in many natural language processing tasks, relevance


matching in ad-hoc retrieval focuses on the relevance of a document to a query, which
has different characteristics.”

This work highlighted the distinction between semantic matching (finding text with
similar meaning) and relevance matching (finding documents relevant to a query), a
crucial insight for retrieval systems.

2.3.3 Open-Domain Question Answering

Chen et al. [2017] presented DrQA in “Reading Wikipedia to Answer Open-Domain Ques-
tions,” which combined a document retriever with a machine reading component:

“Our approach combines a search component based on bigram hashing and TF-IDF
matching with a multi-layer recurrent neural network model trained to extract answers
from text.”

8
This two-stage approach—retrieve relevant documents, then extract answers—foreshadowed
the architecture of RAG systems, though it relied on traditional sparse retrieval methods
and treated the retrieval and reading components as separate modules.

2.3.4 BERT and Contextual Embeddings

Devlin et al. [2019] introduced BERT in “BERT: Pre-training of Deep Bidirectional Trans-
formers for Language Understanding,” which revolutionized natural language processing
with contextual word representations:

“BERT is designed to pre-train deep bidirectional representations from unlabeled text


by jointly conditioning on both left and right context in all layers.”

Nogueira and Cho [2019] quickly applied BERT to information retrieval in “Pas-
sage Re-ranking with BERT,” demonstrating significant improvements in retrieval per-
formance:

“We present a simple re-ranking approach using BERT that achieves state-of-the-art
results on the MSMARCO-Passage Re-Ranking task.”

This work established the effectiveness of transformer-based models for retrieval tasks
and introduced the multi-stage retrieval paradigm that would become standard in many
RAG systems.

2.4 The Emergence of Dense Retrieval (2019-2020)

2.4.1 Dual Encoder Architectures

Gillick et al. [2019] presented the Dual Encoder architecture in “Learning Dense Repre-
sentations for Entity Retrieval,” which used separate encoders for queries and documents
with a shared representation space:

“We present a dual encoder architecture for learning dense entity representations for
efficient retrieval.”

This approach enabled efficient similarity search using approximate nearest neighbor
techniques, a key component of modern RAG systems.

9
2.4.2 Contrastive Learning for Retrieval

Henderson et al. [2017] applied contrastive learning to conversation response retrieval in


“Efficient Natural Language Response Suggestion for Smart Reply,” training models to
distinguish between relevant and irrelevant responses:

“We present a method that uses the natural structure of conversational data to learn
vector representations of queries and responses that are effective for response suggestion.”

This contrastive learning approach would later be refined and applied to document
retrieval in dense retrieval models like DPR.

2.4.3 Pre-RAG Dense Retrieval

Lee et al. [2019] introduced ORQA (Open-Retrieval Question Answering) in “Latent Re-
trieval for Weakly Supervised Open Domain Question Answering,” which used a pre-
trained language model for both retrieval and reading:

“We present a new approach to open-domain QA that learns a retriever and reader
simultaneously, using only question-answer pairs as supervision.”

This work represented a significant step toward RAG, using dense retrieval and inte-
grating retrieval more tightly with the answer generation process, though it still treated
them as separate components.

2.5 Transition to RAG (2020)

The stage was set for the emergence of RAG with several key developments in 2020:

1. REALM [Guu et al., 2020]: Integrated retrieval directly into pre-training, using
a knowledge retriever to access relevant documents during both pre-training and
fine-tuning.

2. DPR [Karpukhin et al., 2020]: Demonstrated the effectiveness of dense passage re-
trieval for open-domain question answering, providing a strong retrieval component
for RAG systems.

3. RAG [Lewis et al., 2020]: Formally introduced Retrieval-Augmented Generation,


combining a pre-trained seq2seq model with a dense retriever in a unified architec-
ture.

10
These works marked the beginning of the RAG era, building upon decades of research
in information retrieval, question answering, and neural language models.

3 Foundations of RAG Systems

3.1 Emergence of RAG

The term “Retrieval-Augmented Generation” was formally introduced by Lewis et al.


[2020] in their seminal paper published in NeurIPS. The authors proposed a general-
purpose fine-tuning recipe for RAG models that combine pre-trained parametric and
non-parametric memory for language generation. As Lewis et al. [2020] explain:

“We introduce RAG models where the parametric memory is a pre-trained seq2seq
model and the non-parametric memory is a dense vector index of Wikipedia, accessed
with a pre-trained neural retriever.”

Their approach utilized a Dense Passage Retriever (DPR) to access relevant informa-
tion from a knowledge corpus, which was then fed into a sequence-to-sequence model for
generation. The authors demonstrated that RAG models outperformed traditional lan-
guage models on knowledge-intensive tasks, showing significant improvements in factual
accuracy and providing provenance for their predictions.

Concurrent with Lewis et al.’s work, Guu et al. [2020] introduced REALM (Retrieval-
Augmented Language Model Pre-Training) at ICML 2020. REALM represented a signif-
icant advancement by incorporating retrieval mechanisms directly into the pre-training
process. As described by the authors:

“To capture knowledge in a more modular and interpretable way, we augment language
model pre-training with a latent knowledge retriever, which allows the model to retrieve
and attend over documents from a large corpus such as Wikipedia, used during pre-
training, fine-tuning and inference.”

A key innovation in REALM was the joint training of the retriever and language
model components through an unsupervised learning approach. This allowed the system
to learn what information to retrieve without explicit supervision, using masked language
modeling as the learning signal.

11
3.2 Dense Passage Retrieval

The Dense Passage Retrieval (DPR) system, introduced by Karpukhin et al. [2020] at
EMNLP, represents a fundamental component of many RAG systems. DPR uses dense
vector representations of passages and queries to perform efficient similarity search. The
authors demonstrated that:

“Our dense retriever outperforms a strong Lucene-BM25 system greatly by 9%-19%


absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA
system establish new state-of-the-art on multiple open-domain QA benchmarks.”

DPR employs a bi-encoder framework where separate encoders map queries and pas-
sages to a shared vector space. The system is trained using contrastive learning with
positive and negative passage examples, creating a retrieval mechanism that captures
semantic relationships beyond keyword matching.

3.3 Evolution of RAG Systems

Early RAG systems relied on static knowledge bases, typically using Wikipedia or other
fixed corpora as their non-parametric memory. Recent research has focused on extending
RAG to incorporate more dynamic and diverse knowledge sources.

Borgeaud et al. [2022], in their paper “Improving language models by retrieving from
trillions of tokens,” introduced RETRO (Retrieval-Enhanced Transformer), which scales
the retrieval corpus to trillions of tokens. This approach demonstrated that retrieval-
based language models can effectively leverage much larger knowledge bases than previous
systems.

Li et al. [2024] explored various RAG system designs in their COLING paper “En-
hancing Retrieval-Augmented Generation: A Study of Best Practices.” Their research
systematically investigated key factors including language model size, prompt design,
document chunk size, knowledge base size, retrieval depth, query expansion techniques,
and contrastive in-context learning.

Recent advancements have moved beyond single-step retrieval to more sophisticated


multi-step and iterative approaches. Gao et al. [2023] introduced a taxonomy of RAG
systems in their comprehensive survey, categorizing them into single-step RAG, multi-
step RAG, and iterative RAG, which incorporate feedback loops between retrieval and
generation components.

12
4 Retrieval Algorithms in RAG Systems

4.1 Fundamental Retrieval Paradigms

4.1.1 Dense Retrieval

Dense retrieval represents documents and queries as dense vector embeddings in a shared
semantic space, enabling similarity-based retrieval beyond lexical matching. The seminal
work in this area came from Karpukhin et al. [2020], who introduced Dense Passage
Retrieval (DPR) for open-domain question answering:

“We introduce Dense Passage Retrieval (DPR), a new approach for retrieval that
uses dense representations alone, where embeddings are learned from a small number of
questions and passages by a simple dual-encoder framework.”

The success of DPR catalyzed research into dense retrieval methods for RAG systems,
with subsequent work focusing on improving embedding quality, retrieval efficiency, and
domain adaptation.

4.1.2 Sparse Retrieval

Despite advances in dense retrieval, sparse retrieval methods remain important in RAG
systems due to their interpretability and efficiency. Robertson and Zaragoza [2009] pro-
vided the theoretical foundation for BM25, which remains a strong baseline in many
retrieval tasks:

“The BM25 weighting scheme has become a de facto standard for probabilistic ap-
proaches to document retrieval. It represents a specific instantiation of the probabilistic
relevance framework, which provides a principled foundation for designing retrieval func-
tions.”

Lin et al. [2021] demonstrated in their paper “Few-Shot Learning with Siamese Net-
works and Label Tuning” that sparse retrieval methods like BM25 can still outperform
dense retrievers in certain scenarios, particularly when dealing with specialized terminol-
ogy or when training data is limited.

13
4.1.3 Hybrid Retrieval

Recognizing the complementary strengths of dense and sparse retrieval, researchers have
developed hybrid approaches. Luan et al. [2021] introduced the Sparse-Dense Represen-
tation (SPLADE) model in their paper “Sparse Lexical and Expansion Representation
for Information Retrieval”:

“SPLADE combines the efficiency of sparse retrieval with the effectiveness of dense
retrieval by learning sparse expanded representations of queries and documents.”

Their approach uses BERT to predict importance weights for both document terms
and related terms, creating sparse representations that capture semantic relationships
while maintaining the efficiency benefits of inverted indices.

4.2 Advanced Retrieval Techniques

4.2.1 Multi-stage Retrieval

Multi-stage retrieval pipelines have emerged as a dominant paradigm in academic research


on RAG systems. Nogueira and Cho [2019] introduced a two-stage approach in their paper
“Passage Re-ranking with BERT”:

“We present a simple re-ranking approach using BERT that achieves state-of-the-
art results on the MSMARCO-Passage Re-Ranking task. Our approach is based on a
two-stage pipeline, where an initial retrieval system based on BM25 is followed by a
BERT-based re-ranker.”

This approach has been widely adopted in RAG systems, with initial retrieval us-
ing efficient methods (BM25 or approximate nearest neighbor search) followed by more
computationally intensive re-ranking of the top candidates.

4.2.2 Query Reformulation and Expansion

Query reformulation techniques have been extensively studied in the academic litera-
ture. Mao et al. [2021] introduced Generation-Augmented Retrieval (GAR) in their
paper “Generation-Augmented Retrieval for Open-Domain Question Answering”:

“We propose Generation-Augmented Retrieval (GAR), which reformulates the original


query to produce an enhanced query that leads to better retrieval performance.”

14
Building on this work, Wang et al. [2023] proposed Hypothetical Document Embed-
dings (HyDE) in “Precise Zero-Shot Dense Retrieval without Relevance Labels,” which
uses an LLM to generate a hypothetical document that would be relevant to the query,
and then uses the embedding of this hypothetical document for retrieval instead of the
query embedding.

4.2.3 Contrastive Learning for Retrieval

Contrastive learning has emerged as a powerful technique for training retrieval models.
Xiong et al. [2021] introduced ANCE (Approximate Nearest Neighbor Negative Con-
trastive Learning) in their paper “Approximate Nearest Neighbor Negative Contrastive
Learning for Dense Text Retrieval”:

“ANCE trains the dense retriever with negatives from the model’s own retrieval results,
which are closer to the query than random negatives and thus provide more informative
learning signals.”

Extending this work, Qu et al. [2021] introduced RocketQA in “RocketQA: An Op-


timized Training Approach to Dense Passage Retrieval for Open-Domain Question An-
swering,” which introduces three optimization strategies: cross-batch negatives, denoised
hard negative sampling, and data augmentation.

4.3 Context-Aware and Efficient Retrieval

Recent academic research has focused on making retrieval more context-aware in RAG
systems. Shao et al. [2023] introduced Self-RAG in their paper “Self-RAG: Learning to
Retrieve, Generate, and Critique through Self-Reflection,” which introduces a retrieval-
aware LLM that can decide when to retrieve, what to retrieve, and whether to use the
retrieved content through a process of self-reflection.

Efficiency considerations are crucial for practical RAG systems. Izacard et al. [2022]
addressed this in “Atlas: Few-shot Learning with Retrieval Augmented Language Mod-
els,” introducing an efficient pre-training approach for retrieval-augmented language mod-
els that enables strong performance with minimal fine-tuning.

15
5 Vector Indexing Methods for Efficient Retrieval

5.1 Approximate Nearest Neighbor Search

The challenge of exact nearest neighbor search in high-dimensional spaces has been well-
documented in academic literature. Weber et al. [1998] established fundamental theoret-
ical limitations in their seminal paper “A Quantitative Analysis and Performance Study
for Similarity-Search Methods in High-Dimensional Spaces”:

“As dimensionality increases, the distance to the nearest neighbor approaches the
distance to the farthest neighbor, making exact nearest neighbor search computationally
intractable for high-dimensional data.”

This phenomenon, known as the “curse of dimensionality,” has motivated the develop-
ment of approximate nearest neighbor (ANN) search methods that trade perfect accuracy
for significant gains in computational efficiency.

Locality-Sensitive Hashing (LSH), introduced by Indyk and Motwani [1998] in “Ap-


proximate Nearest Neighbors: Towards Removing the Curse of Dimensionality,” provides
a rigorous theoretical framework with provable performance guarantees for approximate
nearest neighbor search.

5.2 Graph-Based Indexing Methods

5.2.1 Hierarchical Navigable Small World (HNSW)

The Hierarchical Navigable Small World (HNSW) algorithm, introduced by Malkov and
Yashunin [2018] in their paper “Efficient and Robust Approximate Nearest Neighbor
Search Using Hierarchical Navigable Small World Graphs,” has become one of the most
widely used indexing methods in RAG systems:

“We present a new approach for the approximate K-nearest neighbor search based on
navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW).
The proposed solution is fully graph-based, without any need for additional search struc-
tures.”

HNSW builds upon the concept of navigable small world graphs, organizing vectors in
a multi-layer graph structure where each layer contains a subset of the points from lower

16
layers. This hierarchical structure enables efficient search by first navigating through
sparse top layers to quickly reach the approximate region of the query point, then refining
the search in denser lower layers.

Subsequent academic research has further analyzed and optimized HNSW. Prokhorenkova
and Shekhovtsov [2020] provided theoretical analysis in “Graph-based Nearest Neighbor
Search: From Practice to Theory,” establishing formal guarantees for graph-based meth-
ods like HNSW.

5.2.2 Other Graph-Based Methods

Building on the NSW concept, Subramanya et al. [2019] introduced the Vamana graph
in their paper “Diskann: Fast Accurate Billion-point Nearest Neighbor Search on a Single
Node,” which improved upon HNSW by using a different graph construction algorithm
that better balances search efficiency and index construction time.

5.3 Quantization-Based Methods

5.3.1 Product Quantization

Product Quantization (PQ), introduced by Jégou et al. [2011] in their influential paper
“Product Quantization for Nearest Neighbor Search,” has become a fundamental tech-
nique for compressing vector representations while maintaining search capability:

“This paper introduces a product quantization based approach for approximate nearest
neighbor search. The idea is to decompose the space into a Cartesian product of low-
dimensional subspaces and to quantize each subspace separately.”

PQ works by dividing high-dimensional vectors into subvectors, quantizing each sub-


vector independently, and representing the original vector as a concatenation of these
quantized subvectors. This approach enables both memory-efficient storage and fast dis-
tance computation.

5.3.2 Optimized Product Quantization

Building on the foundation of PQ, Ge et al. [2013] introduced Optimized Product Quan-
tization (OPQ) in their paper “Optimized Product Quantization,” which improves upon

17
standard PQ by finding an optimal rotation of the data before applying product quanti-
zation, leading to better quantization accuracy and retrieval performance.

Further advancements in this area include Locally Optimized Product Quantization


(LOPQ) by Kalantidis and Avrithis [2014] and Additive Quantization by Babenko and
Lempitsky [2014], both of which offer improved quantization accuracy at the cost of
additional computational complexity.

5.4 Inverted Index-Based Methods

5.4.1 Inverted File Index (IVF)

The Inverted File Index (IVF) approach, described in detail by Sivic and Zisserman
[2003] in “Video Google: A Text Retrieval Approach to Object Matching in Videos,”
adapts techniques from text retrieval to vector search. In the context of vector search,
IVF partitions the vector space into clusters and builds an inverted index that maps each
cluster to its member vectors.

5.4.2 IVF-PQ: Combining Inverted Indices and Product Quantization

Baranchuk et al. [2018] examined the combination of inverted indices and product quan-
tization in their paper “Revisiting the Inverted Indices for Billion-Scale Approximate
Neighbors”:

“We argue that the potential of the simple inverted index was not fully exploited in
previous works and advocate its usage both for highly-entangled deep descriptors and
relatively disentangled SIFT descriptors.”

Their research demonstrated that properly optimized inverted indices can outperform
more complex methods for large-scale retrieval tasks.

5.5 Hybrid and Multi-Index Approaches

Recent academic work has focused on hybrid approaches that combine multiple indexing
strategies. Johnson et al. [2019] described such an approach in “Billion-scale similarity
search with GPUs”:

18
“We present a hybrid system that combines an inverted index with product quantiza-
tion to achieve both memory efficiency and search speed.”

Their FAISS (Facebook AI Similarity Search) library implements various indexing


methods, including IVF-PQ, which combines inverted file indices with product quantiza-
tion, and has become a standard tool in academic research on vector indexing.

6 Empirical Evaluation Methodologies in RAG Research

6.1 Benchmark Datasets

6.1.1 Question Answering Datasets

Question answering has been the primary evaluation domain for RAG systems since
their inception. Kwiatkowski et al. [2019] introduced Natural Questions (NQ), which has
become a standard benchmark:

“Natural Questions contains real user queries issued to Google Search, along with
corresponding Wikipedia pages that might contain the answer and human-annotated
answer spans.”

Other widely used QA datasets include:

• TriviaQA [Joshi et al., 2017]: Contains question-answer pairs authored by trivia


enthusiasts, with answers independently gathered from Wikipedia and the web.

• SQuAD [Rajpurkar et al., 2016]: Stanford Question Answering Dataset, featuring


questions posed by crowdworkers on Wikipedia articles.

• HotpotQA [Yang et al., 2018]: Multi-hop question answering dataset requiring


reasoning across multiple documents.

Lewis et al. [2020] evaluated the original RAG model on these datasets, establishing
them as standard benchmarks for RAG evaluation:

“We evaluate RAG models on a range of knowledge-intensive NLP tasks, including


open-domain question answering with Natural Questions, WebQuestions, and TriviaQA.”

19
6.1.2 Knowledge-Intensive Tasks

Beyond question answering, RAG systems are evaluated on other knowledge-intensive


tasks. The KILT benchmark, introduced by Petroni et al. [2021], provides a unified
framework:

“KILT consists of 11 datasets across five knowledge-intensive tasks: fact checking,


entity linking, slot filling, open-domain question answering, and dialogue.”

This benchmark enables more comprehensive evaluation across diverse tasks while
using a consistent knowledge source (Wikipedia), facilitating more direct comparisons
between different approaches.

6.1.3 Fact Verification and Hallucination Assessment

As RAG systems aim to reduce hallucination in language models, specialized datasets for
fact verification have become important. Thorne et al. [2018] introduced FEVER (Fact
Extraction and VERification):

“FEVER consists of 185,445 claims generated by altering sentences extracted from


Wikipedia and subsequently verified without knowledge of the sentence they were derived
from.”

More recently, Rashkin et al. [2023] developed FACTSCORE, specifically designed to


evaluate factuality in generated text:

“FACTSCORE is a framework for evaluating the factual accuracy of text generated


by large language models, with a focus on attributable, precise claims.”

6.2 Evaluation Metrics

6.2.1 Retrieval-Specific Metrics

The retrieval component of RAG systems is typically evaluated using standard infor-
mation retrieval metrics. Karpukhin et al. [2020] used the following metrics in their
evaluation of Dense Passage Retrieval:

• Top-k Accuracy: The percentage of questions for which the answer is contained
in the top-k retrieved passages.

20
• Mean Reciprocal Rank (MRR): The average of the reciprocal ranks of the first
relevant passage across all queries.

• Precision@k: The proportion of relevant passages among the top-k retrieved pas-
sages.

• Recall@k: The proportion of all relevant passages that are retrieved in the top-k
results.

Xiong et al. [2021] emphasized the importance of recall in their work on dense retrieval:

“Recall is particularly important for retrieval systems that feed into downstream com-
ponents, as relevant documents missed at the retrieval stage cannot be recovered later.”

6.2.2 Generation-Specific Metrics

For evaluating the generation quality of RAG systems, researchers employ both reference-
based and reference-free metrics. Lewis et al. [2020] used the following metrics:

• Exact Match (EM): The percentage of generated answers that exactly match the
reference answer.

• F1 Score: The harmonic mean of precision and recall at the token level between
the generated and reference answers.

• ROUGE-L: Recall-Oriented Understudy for Gisting Evaluation, measuring the


longest common subsequence between generated and reference texts.

Izacard and Grave [2021] noted limitations of these metrics in their work on Fusion-
in-Decoder:

“Exact Match and F1 metrics can be overly strict, especially when multiple valid
answer formulations exist. We therefore also report ROUGE-L, which better captures
semantic similarity.”

6.2.3 End-to-End Evaluation Metrics

Evaluating RAG systems end-to-end presents unique challenges, as both retrieval and
generation quality must be considered. Shao et al. [2023] proposed specialized metrics
for Self-RAG:

21
“We introduce Retrieval Precision and Retrieval Utility metrics, which measure both
whether the system retrieves when it should and whether the retrieved information is
actually used in generation.”

Similarly, Chen et al. [2023] developed metrics specifically for evaluating RAG systems
in their RAGAS framework:

“RAGAS provides automated metrics for evaluating RAG pipelines, including answer
relevance, answer faithfulness, context relevance, and context precision.”

6.3 Experimental Design Approaches

6.3.1 Ablation Studies

Ablation studies, which systematically remove or modify components of a system to


assess their impact, are widely used in RAG research. Lewis et al. [2020] employed this
approach:

“We conduct ablation studies to understand the impact of different components of


RAG models, including the retrieval mechanism, the fusion approach, and the pre-training
strategy.”

This methodology helps isolate the contribution of specific components and design
choices to overall system performance.

6.3.2 Comparative Analysis

Comparative analysis against established baselines is standard practice in RAG research.


Guu et al. [2020] compared REALM against both retrieval-free and retrieval-based base-
lines:

“We compare against state-of-the-art models for both explicit and implicit knowledge
storage on three popular Open-QA benchmarks, and find that we outperform all previous
methods by a significant margin.”

These comparisons typically include both sparse retrieval methods (e.g., BM25) and
other dense retrieval approaches to provide a comprehensive performance assessment.

22
6.3.3 Cross-Domain Evaluation

To assess the generalization capabilities of RAG systems, researchers increasingly employ


cross-domain evaluation. Thakur et al. [2021] introduced the BEIR benchmark specifi-
cally for this purpose:

“BEIR provides a diverse set of information retrieval tasks to evaluate the zero-shot
transfer capabilities of retrieval models.”

This approach helps identify whether performance improvements are robust across
different domains or limited to specific datasets.

6.3.4 Human Evaluation

Despite the prevalence of automated metrics, human evaluation remains important for
assessing aspects of RAG systems that are difficult to quantify automatically. Shuster
et al. [2021] emphasized this in their work on knowledge-grounded dialogue:

“We conduct human evaluations to assess factual accuracy, relevance, and engaging-
ness of responses, finding that automated metrics often fail to capture important quali-
tative differences.”

Human evaluations typically involve presenting judges with system outputs and asking
them to rate various aspects, such as factual accuracy, relevance, and coherence.

6.4 Evaluation Challenges and Limitations

6.4.1 Reference Limitations

A significant challenge in evaluating RAG systems is the limitation of reference-based


evaluation. Krishna et al. [2021] highlighted this issue:

“Reference-based evaluation assumes that the reference answer is complete and correct,
which is often not the case for complex questions with multiple valid answers.”

This challenge has led to increased interest in reference-free evaluation methods that
assess the quality and factuality of generated text without requiring exact matches to
reference answers.

23
6.4.2 Retrieval Evaluation Complexity

Evaluating the retrieval component of RAG systems presents unique challenges. Karpukhin
et al. [2020] noted:

“The standard approach of using a single positive passage per question underestimates
retrieval performance, as multiple passages may contain the answer.”

This has led researchers to develop more nuanced evaluation approaches, such as
considering all passages containing the answer as positive examples or using human judg-
ments to assess relevance.

6.4.3 Trade-off Assessment

RAG systems involve inherent trade-offs between retrieval accuracy, generation quality,
computational efficiency, and memory usage. Borgeaud et al. [2022] addressed this chal-
lenge in their evaluation of RETRO:

“We systematically evaluate the trade-offs between retrieval corpus size, computational
cost, and model performance, providing insights into the scaling behavior of retrieval-
augmented language models.”

Comprehensive evaluation frameworks must account for these multiple dimensions to


provide a holistic assessment of RAG systems.

6.5 Recent Advances in Evaluation Methodologies

6.5.1 Automated Factuality Assessment

Recent work has focused on developing automated methods for assessing the factuality
of generated text. Honovich et al. [2022] introduced TRUE (Trustworthy and Reliable
Understanding Evaluation):

“TRUE is a framework for automatically evaluating the factual consistency of gener-


ated text with respect to source documents, using question generation and answering to
probe for factual inconsistencies.”

This approach enables more scalable evaluation of the factual accuracy of RAG sys-
tems without requiring extensive human annotation.

24
6.5.2 Retrieval-Aware Evaluation

Recognizing the tight coupling between retrieval and generation in RAG systems, re-
searchers have developed evaluation methodologies that explicitly account for this rela-
tionship. Si et al. [2023] proposed:

“We introduce a retrieval-aware evaluation framework that jointly assesses retrieval


quality and its impact on generation, providing a more holistic view of RAG system
performance.”

This approach acknowledges that retrieval errors propagate to generation and that
the utility of retrieved information depends on how effectively it is incorporated into the
generated output.

6.5.3 Efficiency-Focused Evaluation

As RAG systems are deployed in real-world applications, evaluation of computational


efficiency has gained importance. Izacard et al. [2022] emphasized this aspect in their
evaluation of Atlas:

“We evaluate not only accuracy but also indexing time, retrieval latency, and memory
usage, providing a comprehensive assessment of practical deployment considerations.”

This trend reflects the growing recognition that RAG systems must balance effective-
ness with efficiency to be practically useful.

7 Research Gaps and Future Directions

The academic literature reveals several important research gaps and opportunities for
future work in algorithmic optimization of retrieval mechanisms in RAG systems:

7.1 Dynamic Knowledge Management

Most current vector indexing methods are optimized for static datasets, but RAG systems
often require dynamic updates to their knowledge bases. Iwasaki and Miyazaki [2018]
addressed this challenge in “Optimization of Indexing Based on k-Nearest Neighbor Graph
for Proximity Search in High-dimensional Data,” but more research is needed on efficient

25
index maintenance for dynamic knowledge bases.

Dynamic knowledge management presents several key challenges:

• Incremental Index Updates: Efficiently updating index structures when new


documents are added, without requiring complete reindexing.

• Deletion and Modification Handling: Developing methods to efficiently handle


document deletions and modifications in vector indices.

• Temporal Consistency: Ensuring consistency in retrieval results during index


updates, particularly in high-throughput systems.

• Retraining Strategies: Determining when and how to retrain embedding models


as the knowledge base evolves.

7.2 Domain-Specific Optimization

The performance of retrieval algorithms and indexing methods can vary significantly
across different domains and data types. There is a need for research on domain-specific
optimization techniques that can adapt to the characteristics of particular applications.

Domain-specific challenges include:

• Specialized Terminology: Handling domain-specific terminology that may not


be well-represented in general-purpose embeddings.

• Structural Knowledge: Incorporating domain-specific structural knowledge into


the retrieval process.

• Contextual Relevance: Adapting notions of relevance to match domain-specific


requirements and user expectations.

• Efficiency-Accuracy Trade-offs: Calibrating efficiency-accuracy trade-offs based


on domain-specific constraints and requirements.

7.3 Theoretical Foundations

While empirical evaluations have demonstrated the effectiveness of various retrieval and
indexing methods, the theoretical understanding of why certain approaches work better

26
than others in specific contexts remains limited. More research is needed to establish
stronger theoretical foundations for RAG systems.

Key areas for theoretical development include:

• Convergence Properties: Analyzing the convergence properties of graph-based


nearest neighbor search algorithms in the context of RAG systems.

• Error Bounds: Establishing theoretical error bounds for approximate nearest


neighbor search methods in retrieval tasks.

• Optimality Criteria: Defining optimality criteria for RAG retrieval algorithms


that balance multiple objectives.

• Information-Theoretic Analysis: Developing information-theoretic frameworks


to analyze the efficiency of knowledge transfer in RAG systems.

7.4 Evaluation Metrics

Current evaluation metrics for RAG systems often focus on retrieval accuracy or genera-
tion quality in isolation. There is a need for holistic evaluation frameworks that consider
the end-to-end performance of RAG systems, including both retrieval and generation
components.

Opportunities for metric development include:

• Utility-Based Metrics: Developing metrics that assess the utility of retrieved


information for the generation task, rather than just retrieval accuracy.

• Efficiency-Aware Metrics: Creating evaluation frameworks that incorporate


computational efficiency alongside effectiveness measures.

• Robustness Metrics: Designing metrics to evaluate the robustness of RAG sys-


tems to variations in query formulation and knowledge base quality.

• Explainability Metrics: Establishing metrics to assess the explainability and


interpretability of RAG system outputs.

27
7.5 Hardware Acceleration

As RAG systems scale to larger knowledge bases, hardware acceleration becomes increas-
ingly important for maintaining real-time performance. Research on specialized hard-
ware architectures and algorithms optimized for specific hardware platforms represents a
promising direction.

Hardware acceleration opportunities include:

• GPU-Optimized Indices: Developing index structures specifically designed to


leverage GPU parallelism for faster search.

• Quantization-Aware Hardware: Creating specialized hardware that can effi-


ciently perform operations on quantized vectors.

• Memory Hierarchies: Designing algorithms that efficiently utilize modern mem-


ory hierarchies, from on-chip cache to disk storage.

• Custom Accelerators: Exploring application-specific integrated circuits (ASICs)


or field-programmable gate arrays (FPGAs) for RAG-specific operations.

7.6 Integration of Symbolic and Neural Approaches

Current RAG systems predominantly rely on dense vector representations for retrieval,
but there are opportunities to integrate symbolic knowledge and reasoning into the re-
trieval process.

Research directions in this area include:

• Hybrid Indices: Developing index structures that combine vector representations


with symbolic knowledge representations.

• Logic-Based Filtering: Incorporating logical constraints into the retrieval process


to filter results based on formal reasoning.

• Knowledge Graph Integration: Creating retrieval mechanisms that seamlessly


combine information from vector indices and knowledge graphs.

• Neuro-Symbolic Reasoning: Developing end-to-end neuro-symbolic systems that


can perform both vector-based retrieval and symbolic reasoning.

28
7.7 Multi-Modal RAG Systems

Most current RAG research focuses on text, but there are growing opportunities for
multi-modal RAG systems that can retrieve and generate across different modalities.

Key research challenges in multi-modal RAG include:

• Cross-Modal Embeddings: Developing unified embedding spaces that effectively


capture similarities across different modalities.

• Modal-Specific Indexing: Creating indexing strategies optimized for different


data modalities while enabling unified search.

• Multi-Modal Relevance: Defining and evaluating relevance in multi-modal con-


texts where different modalities may contribute differently to overall relevance.

• Multi-Modal Generation: Integrating retrieved information from multiple modal-


ities into coherent multi-modal outputs.

7.8 Privacy-Preserving RAG

As RAG systems are deployed in privacy-sensitive domains, there is a growing need


for privacy-preserving retrieval mechanisms that can protect both user queries and the
knowledge base content.

Privacy-preserving research directions include:

• Encrypted Search: Developing methods for similarity search over encrypted vec-
tors without compromising privacy.

• Differential Privacy: Applying differential privacy techniques to RAG retrieval


to provide formal privacy guarantees.

• Federated RAG: Creating distributed RAG architectures that can leverage knowl-
edge across multiple private data sources.

• Privacy-Utility Trade-offs: Understanding and optimizing the trade-offs be-


tween privacy guarantees and retrieval effectiveness.

These research gaps and future directions represent significant opportunities for ad-
vancing the state of the art in RAG systems through algorithmic optimization of retrieval
mechanisms and data structures.

29
8 Conclusion

This literature review has examined the academic research on algorithmic optimization
of retrieval mechanisms in RAG systems, with a focus on the data structures that enable
efficient knowledge access. The review has covered the historical evolution of information
retrieval systems, the foundations of RAG systems, retrieval algorithms, vector indexing
methods, and empirical evaluation methodologies, highlighting key contributions from
peer-reviewed academic literature.

The field of RAG systems is rapidly evolving, with ongoing research addressing chal-
lenges in scalability, efficiency, and accuracy. The integration of advanced retrieval algo-
rithms with sophisticated vector indexing methods represents a promising approach to
improving the performance of RAG systems across a wide range of applications.

Future research in this area will likely focus on addressing the identified research
gaps, particularly in dynamic knowledge management, domain-specific optimization, and
hardware acceleration. As RAG systems continue to evolve, algorithmic optimization of
retrieval mechanisms will remain a critical area of research, with significant implications
for the development of more capable and efficient AI systems.

The literature review has revealed several key insights:

1. RAG systems represent a significant advancement in language model capabilities,


addressing limitations of traditional LLMs by incorporating external knowledge in
a structured way.

2. The algorithmic optimization of retrieval mechanisms is crucial for RAG perfor-


mance, with trade-offs between retrieval accuracy, computational efficiency, and
memory usage.

3. Vector indexing methods, particularly graph-based approaches like HNSW and


quantization techniques like PQ, play a central role in enabling efficient similar-
ity search over large knowledge bases.

4. The evaluation of RAG systems requires specialized methodologies that consider


both retrieval and generation quality, with emerging metrics for assessing factuality
and relevance.

5. There are significant opportunities for theoretical and practical advancements in


RAG retrieval algorithms, particularly in dynamic knowledge management, multi-
modal retrieval, and privacy-preserving techniques.

30
This systematic review of the literature provides a solid foundation for future research
and development in the field of retrieval-augmented generation, with particular emphasis
on algorithmic and data structure approaches to optimizing knowledge access.

31
9 Visual Elements and Comparisons

9.1 Tables and Comparisons

9.1.1 Comparison of Foundational RAG Papers

Paper Authors Year Venue Key Contribu- Retrieval Knowledge


tion Method Source
REALM Guu et al. 2020 ICML Joint pre- Dense Wikipedia
training of retrieval
retriever and with
language model learned
representa-
tions
DPR Karpukhin 2020 EMNLP Effective dense Bi-encoder Wikipedia
et al. retrieval for with con-
open-domain trastive
QA learning
RAG Lewis et al. 2020 NeurIPS End-to-end DPR- Wikipedia
retrieval- based
augmented neural
generation retriever
RETRO Borgeaud 2022 ICML Scaling retrieval BERT- MassiveText
et al. to trillions of to- based corpus
kens chunked
retriever
Atlas Izacard et 2022 arXiv Few-shot learn- Contriever Wikipedia,
al. ing with RAG with CC-News
unsuper-
vised pre-
training
Self- Shao et al. 2023 arXiv Retrieval, gen- LLM- Wikipedia,
RAG eration, and based web
critique through retrieval
self-reflection decisions

Table 1: Comparison of Foundational RAG Papers

32
Indexing Key Pa- Year Time Com- Space Com- Key Advan- Key Limita-
Method per plexity plexity tages tions
(Query)
Flat - - O(nd) O(nd) Exact results, Prohibitively
(Brute simple imple- slow for large
Force) mentation datasets
LSH Indyk & 1998 O(d log n) O(nd) Theoretical Performance
Motwani guarantees, degrades in
simple con- high dimen-
cept sions
HNSW Malkov & 2018 O(d log n) O(nd) State-of- High memory
Yashunin the-art per- overhead,
formance, complex im-
logarithmic plementation
search time
IVF Sivic & 2003 O(d(n/k)) O(nd) Simple con- Performance
Zisserman cept, efficient depends on
for first-stage clustering
retrieval quality
PQ Jégou et 2011 O(d + k) O(n + kd) Dramatic Lossy com-
al. memory re- pression,
duction, fast reduced accu-
distance com- racy
putation
IVF-PQ Baranchuk 2018 O(d(n/k) + O(n + kd) Balances Complex pa-
et al. k) speed and rameter tun-
memory effi- ing, still lossy
ciency

Table 2: Comparison of Vector Indexing Methods for RAG Systems

9.1.2 Comparison of Vector Indexing Methods for RAG Systems

9.1.3 Comparison of Retrieval Paradigms in RAG Systems

9.2 Figures and Diagrams

9.2.1 RAG System Architecture

9.2.2 HNSW Indexing Structure

33
Retrieval RepresentativeKey Char- Advantages Limitations Applications
Paradigm Papers acteristics
Sparse Re- Robertson Term-based Interpretable,Limited General
trieval & Zaragoza matching efficient, semantic search,
(2009) with inverted no training under- specialized
indices required standing domains
Dense Re- Karpukhin et Neural em- Semantic Requires Open-
trieval al. (2020) beddings in matching, training domain
shared vector handles data QA, se-
space synonyms mantic
search
Hybrid Re- Luan et al. Combines Leverages Increased Production
trieval (2021) sparse and strengths complexity systems
dense ap- of both
proaches paradigms
Multi- Nogueira & Initial re- Balances Pipeline Web
stage Cho (2019) trieval fol- efficiency complexity search,
Retrieval lowed by and accu- large-scale
re-ranking racy retrieval

Table 3: Comparison of Retrieval Paradigms in RAG Systems

Generation
Retrieval Component
Component
Retrieval
ResponseAl-
RAG System Architecture
Language
Vector Indexing
Model
Context In- Generated gorithms
Generation
Query Processing User Query
Retrieved Response
Documents
tegration

Figure 1: RAG System Architecture

34
Hierarchical Navigable Small World

Layer 3 (Top)

Layer 2

Layer 1

Layer 0 (Base)

Search Path: Query → Enter at top layer →


Navigate to closest node at each layer → De-
scend to next layer → Final search at base layer

Figure 2: HNSW Indexing Structure

35
10 References

References

Artem Babenko and Victor Lempitsky. Additive quantization for extreme vector com-
pression. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 931–938, 2014.

Dmitry Baranchuk, Artem Babenko, and Yury Malkov. Revisiting the inverted indices for
billion-scale approximate nearest neighbors. In Proceedings of the European Conference
on Computer Vision, pages 202–216, 2018.

David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal
of Machine Learning Research, 3:993–1022, 2003.

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford,
Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc,
Aidan Clark, et al. Improving language models by retrieving from trillions of tokens.
In International Conference on Machine Learning, pages 2206–2223. PMLR, 2022.

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to
answer open-domain questions. In Proceedings of the 55th Annual Meeting of the As-
sociation for Computational Linguistics, pages 1870–1879, 2017.

Jerry Chen, Yu Guo, Swati Agarwal, and William Yang Wang. RAGAS: Automated
evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217, 2023.

Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard
Harshman. Indexing by latent semantic analysis. Journal of the American Society for
Information Science, 41(6):391–407, 1990.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training
of deep bidirectional transformers for language understanding. In Proceedings of the
2019 Conference of the North American Chapter of the Association for Computational
Linguistics, pages 4171–4186, 2019.

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangqi Jia, Jinyang Pan, Yixin Bi, Yi Dai, Jian Sun,
Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language
models: A survey. arXiv preprint arXiv:2312.10997, 2023.

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Optimized product quantization. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 36(4):744–755, 2013.

36
Daniel Gillick, Sayali Kulkarni, Larry Lansing, Alessandro Presta, Jason Baldridge, Eu-
gene Ie, and Diego Garcia-Olano. Learning dense representations for entity retrieval.
In Proceedings of the 23rd Conference on Computational Natural Language Learning,
pages 528–537, 2019.

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. A deep relevance matching
model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference
on Information and Knowledge Management, pages 55–64. ACM, 2016.

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. Realm:
Retrieval-augmented language model pre-training. In Proceedings of the 37th Interna-
tional Conference on Machine Learning, pages 3929–3938, 2020.

Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-Hsuan Sung, László Lukács, Ruiqi
Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. Efficient natural language
response suggestion for smart reply. arXiv preprint arXiv:1705.00652, 2017.

Or Honovich, Thomas Scialom, Omer Levy, and Timo Schick. TRUE: Re-evaluating
factual consistency evaluation. arXiv preprint arXiv:2204.04991, 2022.

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck.
Learning deep structured semantic models for web search using clickthrough data. In
Proceedings of the 22nd ACM International Conference on Information & Knowledge
Management, pages 2333–2338. ACM, 2013.

Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing
the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on
Theory of Computing, pages 604–613, 1998.

Masajiro Iwasaki and Daisuke Miyazaki. Optimization of indexing based on k-nearest


neighbor graph for proximity search in high-dimensional data. arXiv preprint
arXiv:1810.07355, 2018.

Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models
for open domain question answering. In Proceedings of the 16th Conference of the
European Chapter of the Association for Computational Linguistics, pages 874–880,
2021.

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo
Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. At-
las: Few-shot learning with retrieval augmented language models. arXiv preprint
arXiv:2208.03299, 2022.

37
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs.
IEEE Transactions on Big Data, 7(3):535–547, 2019.

Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettlemoyer. TriviaQA: A large
scale distantly supervised challenge dataset for reading comprehension. In Proceedings
of the 55th Annual Meeting of the Association for Computational Linguistics, pages
1601–1611, 2017.

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest
neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33
(1):117–128, 2011.

Yannis Kalantidis and Yannis Avrithis. Locally optimized product quantization for ap-
proximate nearest neighbor search. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 2321–2328, 2014.

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov,
Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question
answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing, pages 6769–6781, 2020.

Kalpesh Krishna, Siddharth Khosla, Jeffrey P. Bigham, and Zachary C. Lipton. Gener-
ating question-answer hierarchies. In Proceedings of the 59th Annual Meeting of the
Association for Computational Linguistics, pages 5546–5561, 2021.

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh,
Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al.
Natural questions: A benchmark for question answering research. In Transactions of
the Association for Computational Linguistics, volume 7, pages 453–466, 2019.

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. Latent retrieval for weakly su-
pervised open domain question answering. In Proceedings of the 57th Annual Meeting
of the Association for Computational Linguistics, pages 6086–6096, 2019.

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin,
Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian
Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP
tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–
9474, 2020.

Sharath Li, Lennart Stenzel, Carsten Eickhoff, and Seyed Ali Bahrainian. Enhancing
retrieval-augmented generation: A study of best practices. In Proceedings of the 2024
International Conference on Computational Linguistics, pages 449–461, 2024.

38
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and
Rodrigo Nogueira. Few-shot learning with siamese networks and label tuning. In
Proceedings of the 44th International ACM SIGIR Conference on Research and Devel-
opment in Information Retrieval, pages 2356–2362, 2021.

Tie-Yan Liu. Learning to rank for information retrieval. Foundations and Trends in
Information Retrieval, 3(3):225–331, 2009.

Yi Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. Sparse, dense,
and attentional representations for text retrieval. Transactions of the Association for
Computational Linguistics, 9:329–345, 2021.

Yury A Malkov and Dmitry A Yashunin. Efficient and robust approximate nearest neigh-
bor search using hierarchical navigable small world graphs. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 42(4):824–836, 2018.

Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and
Weizhu Chen. Generation-augmented retrieval for open-domain question answering.
In Proceedings of the 59th Annual Meeting of the Association for Computational Lin-
guistics, pages 4089–4100, 2021.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with bert. arXiv preprint
arXiv:1901.04085, 2019.

Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola
De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, et al.
KILT: a benchmark for knowledge intensive language tasks. In Proceedings of the
2021 Conference of the North American Chapter of the Association for Computational
Linguistics, pages 2523–2544, 2021.

Liudmila Prokhorenkova and Alexander Shekhovtsov. Graph-based nearest neighbor


search: From practice to theory. In Proceedings of the 37th International Conference
on Machine Learning, pages 7803–7813, 2020.

Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Wayne Xin Zhao, Daxiang
Dong, Hua Wu, and Haifeng Wang. RocketQA: An optimized training approach to
dense passage retrieval for open-domain question answering. In Proceedings of the
2021 Conference of the North American Chapter of the Association for Computational
Linguistics, pages 5835–5847, 2021.

39
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. SQuAD: 100,000+
questions for machine comprehension of text. In Proceedings of the 2016 Conference
on Empirical Methods in Natural Language Processing, pages 2383–2392, 2016.

Hannah Rashkin, Xi Victoria Lin, Guy Tyen, Maarten Sap, Wen-tau Yih, and Yejin
Choi. Measuring factuality in text generation with attributable sources. arXiv preprint
arXiv:2305.14251, 2023.

Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: BM25
and beyond. Foundations and Trends in Information Retrieval, 3(4):333–389, 2009.

Stephen E Robertson and Karen Spärck Jones. Relevance weighting of search terms.
Journal of the American Society for Information Science, 27(3):129–146, 1976.

Stephen E Robertson et al. The okapi/keenbow experiments: Probabilistic retrieval for


trec-3. NIST SPECIAL PUBLICATION SP, pages 21–21, 1995.

Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector space model for automatic
indexing. Communications of the ACM, 18(11):613–620, 1975.

Gerard Salton, Edward A Fox, and Harry Wu. Introduction to modern information
retrieval. In Proceedings of the ACM Conference, 1983.

Yixuan Shao, Xinyang Geng, Yizhe Liu, Chulaka Gunasekara, Dian Jiang, Nanyun Peng,
and Marjan Ghazvininejad. Self-RAG: Learning to retrieve, generate, and critique
through self-reflection. arXiv preprint arXiv:2310.11511, 2023.

Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. Retrieval
augmentation reduces hallucination in conversation. In Findings of the Association for
Computational Linguistics: EMNLP 2021, pages 3784–3803, 2021.

Chengwei Si, Zhengyuan Chen, Ning Ding, and William Yang Wang. Prompting and
evaluating large language models for retrieval-augmented generation. arXiv preprint
arXiv:2306.10023, 2023.

Josef Sivic and Andrew Zisserman. Video google: A text retrieval approach to object
matching in videos. In Proceedings of the 9th IEEE International Conference on Com-
puter Vision, pages 1470–1477, 2003.

Suhas J Subramanya, Fnu Devvrit, Harsha Raghavan, Vijay Badrinarayanan, and Shan-
mugavelayutham Muthukrishnan. DiskANN: Fast accurate billion-point nearest neigh-
bor search on a single node. In Advances in Neural Information Processing Systems,
volume 32, pages 13766–13776, 2019.

40
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna
Gurevych. BEIR: A heterogeneous benchmark for zero-shot evaluation of information
retrieval models. In Advances in Neural Information Processing Systems, volume 34,
pages 21545–21561, 2021.

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal.


FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings
of the 2018 Conference of the North American Chapter of the Association for Compu-
tational Linguistics, pages 809–819, 2018.

Ivan Vulić and Marie-Francine Moens. Monolingual and cross-lingual information re-
trieval models based on (bilingual) word embeddings. In Proceedings of the 38th In-
ternational ACM SIGIR Conference on Research and Development in Information Re-
trieval, pages 363–372. ACM, 2015.

Guanting Wang, Tu Vu, Tsendsuren Munkhdalai, Alessandro Sordoni, Adam Trischler,


Andrew Mattarella-Micke, Subhransu Maji, and Mohit Iyyer. Precise zero-shot dense
retrieval without relevance labels. arXiv preprint arXiv:2212.10496, 2023.

Roger Weber, Hans-Jörg Schek, and Stephen Blott. A quantitative analysis and perfor-
mance study for similarity-search methods in high-dimensional spaces. In Proceedings
of the 24th International Conference on Very Large Data Bases, pages 194–205, 1998.

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid
Ahmed, and Arnold Overwijk. Approximate nearest neighbor negative contrastive
learning for dense text retrieval. In International Conference on Learning Representa-
tions, 2021.

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhut-
dinov, and Christopher D Manning. HotpotQA: A dataset for diverse, explainable
multi-hop question answering. In Proceedings of the 2018 Conference on Empirical
Methods in Natural Language Processing, pages 2369–2380, 2018.

41

You might also like