Multimodal RAG Systems Hands-On Guide

Uploaded by

aegr82

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Multimodal RAG Systems Hands-On Guide

Uploaded by

aegr82

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Hands-on Guide

to
Multimodal
RAG Systems

Dipanjan (DJ)
Multimodal RAG System

Traditional RAG systems are constrained to text data, making them

ineffective for multimodal data, which includes text, images, tables,
and more
These systems integrate multimodal data processing (text, images,
tables) and utilize multimodal LLMs, like GPT-4o, to provide more
contextual and accurate answers.
The guide provides a detailed guide on building a Multimodal RAG
system with LangChain, integrating intelligent document loaders,
vector databases, and multi-vector retrievers.

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Multimodal Datasets

Multimodal data consists of a mixture of text, tables, images,

graphs and optionally audio and video
The idea is to detect, parse and extract these different elements
separately and then generate downstream artifacts like
embeddings

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Multimodal RAG Workflow

Option 1: Use multimodal embeddings (such as CLIP) to embed images and text
together. Retrieve either using similarity search, but simply link to images in a
docstore. Pass raw images and text chunks to a multimodal LLM for synthesis.
Option 2: Use a multimodal LLM (such as GPT-4o, GPT4-V, LLaVA) to produce text
summaries from images. Embed and retrieve text summaries using a text
embedding model. Again, reference raw text chunks or tables from a docstore for
answer synthesis by a regular LLM; in this case, we exclude images from the
docstore.
Option 3: Use a multimodal LLM (such as GPT-4o, GPT4-V, LLaVA) to produce text,
table and image summaries (text chunk summaries are optional). Embed and
retrieved text, table, and image summaries with reference to the raw elements, as
we did above in option 1. Again, raw images, tables, and text chunks will be
passed to a multimodal LLM for answer synthesis.

Option 3 is the best especially if you have charts as images else you can also
generate multimodal embeddings from text and image combinations

Source: A Comprehensive Guide to Building Multimodal RAG Systems

MultiVector Retriever
Workflow

We will first use a document parsing tool like Unstructured to extract the text, table and
image elements separately
Then we will pass each extracted element into an LLM and generate a detailed text
summary as depicted above.
Next we will store the summaries and their embeddings into a vector database by using
any popular embedder model like OpenAI Embedders. We will also store the
corresponding raw document element (text, table, image) for each summary in a document
store, which can be any database platform like Redis.
The multi-vector retriever links each summary and its embedding to the original
document’s raw element (text, table, image) using a common document identifier (doc_id).
Now, when a user question comes in, first, the multi-vector retriever retrieves the relevant
summaries, which are similar to the question and then using the common doc_ids, the
original text, table and image elements are returned back which are further passed on to
the RAG system’s LLM as the context to answer the user question.

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Multimodal RAG Architecture

Load all documents and use a document loader like unstructured.io to extract text
chunks, image, and tables.
If necessary, convert HTML tables to markdown; they are often very effective with
LLMs
Pass each text chunk, image, and table into a multimodal LLM like GPT-4o and get a
detailed summary.
Store summaries in a vector DB and the raw document pieces in a document DB like
Redis
Connect the two databases with a common document_id using a multi-vector retriever
to identify which summary maps to which raw document piece.
Connect this multi-vector retrieval system with a multimodal LLM like GPT-4o.
Query the system, and based on similar summaries to the query, get the raw document
pieces, including tables and images, as the context.
Using the above context, generate a response using the multimodal LLM for the
question.

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Hands-on Guide

Check out the

HANDS-ON GUIDE
here

Msazure - Create Your Own GenAI Apps
No ratings yet
Msazure - Create Your Own GenAI Apps
30 pages
DR DC Manual Switch Over
No ratings yet
DR DC Manual Switch Over
3 pages
A Survey of Techniques For Maximizing LLM Performance
100% (1)
A Survey of Techniques For Maximizing LLM Performance
40 pages
Glynn Consulting LTD - File Renaming Utility: WWW - Glynnconsulting.co - Uk
No ratings yet
Glynn Consulting LTD - File Renaming Utility: WWW - Glynnconsulting.co - Uk
4 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
Langchain Retrieval Augmented Generation White Paper
100% (1)
Langchain Retrieval Augmented Generation White Paper
23 pages
RAG - The Future of LLMs - LinkedIn
No ratings yet
RAG - The Future of LLMs - LinkedIn
7 pages
NVIDIA RAG Whitepaper
No ratings yet
NVIDIA RAG Whitepaper
7 pages
Guide To Evaluating LLM and RAG Systems
No ratings yet
Guide To Evaluating LLM and RAG Systems
41 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
RAG-HAT - A Hallucination-Aware Tuning Pipeline For LLM in Retrieval-Augmented Generation
No ratings yet
RAG-HAT - A Hallucination-Aware Tuning Pipeline For LLM in Retrieval-Augmented Generation
11 pages
eBook Scaling RAG Systems From POC to Production – 2025
No ratings yet
eBook Scaling RAG Systems From POC to Production – 2025
28 pages
Knowledge Graphs v Vector Databases and when not to use them!
No ratings yet
Knowledge Graphs v Vector Databases and when not to use them!
3 pages
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
No ratings yet
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
34 pages
Vector Database in LLMs
No ratings yet
Vector Database in LLMs
14 pages
FAANGPath Simple Template 1
No ratings yet
FAANGPath Simple Template 1
2 pages
Retrieval Augmentation Reduces Hallucination in Conversation
No ratings yet
Retrieval Augmentation Reduces Hallucination in Conversation
21 pages
A Retrieval-Augmented Generation Based Large Langu
No ratings yet
A Retrieval-Augmented Generation Based Large Langu
9 pages
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
No ratings yet
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
12 pages
Advanced RAG Techniques - What They Are & How To Use Them
No ratings yet
Advanced RAG Techniques - What They Are & How To Use Them
16 pages
Transformers
No ratings yet
Transformers
21 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
Data For GenAI
No ratings yet
Data For GenAI
17 pages
Gen Ai Solutions
No ratings yet
Gen Ai Solutions
14 pages
Semantic Kernel
No ratings yet
Semantic Kernel
471 pages
Crud Rag
No ratings yet
Crud Rag
31 pages
IMPLEMENTATION_OF_GENERATIVE_A (1)
No ratings yet
IMPLEMENTATION_OF_GENERATIVE_A (1)
13 pages
Mastering Chunking in RAG - Techniques and Strategies
No ratings yet
Mastering Chunking in RAG - Techniques and Strategies
12 pages
Evolving LLOMPS For RAG
No ratings yet
Evolving LLOMPS For RAG
6 pages
Generative AI - 48 Hours TOC
No ratings yet
Generative AI - 48 Hours TOC
4 pages
RAG with OpenAI for Financial Analysis
No ratings yet
RAG with OpenAI for Financial Analysis
11 pages
LLM Fine Tuning
No ratings yet
LLM Fine Tuning
1 page
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
Langchain 101
100% (1)
Langchain 101
4 pages
Rag Ultimate Guide
No ratings yet
Rag Ultimate Guide
8 pages
LLM Benchmark
No ratings yet
LLM Benchmark
21 pages
Multi Agents Share
No ratings yet
Multi Agents Share
45 pages
AIM307_Retrieval-Augmented-Generation-with-Amazon-Bedrock
No ratings yet
AIM307_Retrieval-Augmented-Generation-with-Amazon-Bedrock
15 pages
LLM based AI Agents Overview -What, Why, How, PPT Presentation
No ratings yet
LLM based AI Agents Overview -What, Why, How, PPT Presentation
26 pages
Types of RAG: @bhavishya Pandit
No ratings yet
Types of RAG: @bhavishya Pandit
15 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
Generative AI 3d Model
No ratings yet
Generative AI 3d Model
117 pages
MasterClass Agentic AI & RAG Flyer-1
No ratings yet
MasterClass Agentic AI & RAG Flyer-1
4 pages
Graph RAG
No ratings yet
Graph RAG
7 pages
GenAI POC - Training
No ratings yet
GenAI POC - Training
43 pages
LlamaIndex Talk (W&B Fully Connected 2024)
No ratings yet
LlamaIndex Talk (W&B Fully Connected 2024)
38 pages
Regularization_for_Neural_Networks_1718966083
No ratings yet
Regularization_for_Neural_Networks_1718966083
9 pages
Generative Ai for Enterprises Vishal Anand
No ratings yet
Generative Ai for Enterprises Vishal Anand
30 pages
Paper3 - LLM Agent Operating System
No ratings yet
Paper3 - LLM Agent Operating System
14 pages
Building a Dynamic Multi-Agent Workflow_ Harnessing AI Collaboration with LangChain & LangGraph _ by Rohit Kumar _ Oct, 2024 _ Medium
No ratings yet
Building a Dynamic Multi-Agent Workflow_ Harnessing AI Collaboration with LangChain & LangGraph _ by Rohit Kumar _ Oct, 2024 _ Medium
13 pages
Little Guide To Building Large Language Models in 2024
100% (1)
Little Guide To Building Large Language Models in 2024
65 pages
Best Practices For Prompt Engineering With The OpenAI
No ratings yet
Best Practices For Prompt Engineering With The OpenAI
6 pages
15 Interesting Agent Tools
No ratings yet
15 Interesting Agent Tools
19 pages
Software Architecture in An AI World
No ratings yet
Software Architecture in An AI World
25 pages
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
100% (1)
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
39 pages
Guide to Fast GraphRAG
No ratings yet
Guide to Fast GraphRAG
7 pages
GenAI_Interview_Questions-Draft
No ratings yet
GenAI_Interview_Questions-Draft
27 pages
RAG and AI Agents Simplified
No ratings yet
RAG and AI Agents Simplified
14 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
Parameter-Efficient Fine-Tuning Methods For Pretrained Language Models - A Critical Review and Assessment
No ratings yet
Parameter-Efficient Fine-Tuning Methods For Pretrained Language Models - A Critical Review and Assessment
20 pages
A AI: S H M I: Gent Urveying The Orizons of Ultimodal Nteraction
No ratings yet
A AI: S H M I: Gent Urveying The Orizons of Ultimodal Nteraction
80 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
2018 Oblique DT !!
No ratings yet
2018 Oblique DT !!
17 pages
20-Structural deep clustering network
No ratings yet
20-Structural deep clustering network
11 pages
sciadv.aat9004
No ratings yet
sciadv.aat9004
7 pages
Machine Reasoning Explainability
No ratings yet
Machine Reasoning Explainability
72 pages
Vector Quantization
No ratings yet
Vector Quantization
14 pages
Interview Que Only For Cognos
No ratings yet
Interview Que Only For Cognos
6 pages
Data Warehousing - Metadata Concepts
No ratings yet
Data Warehousing - Metadata Concepts
9 pages
The 75 SQL Interview Questions_ Your Ultimate Guide
No ratings yet
The 75 SQL Interview Questions_ Your Ultimate Guide
17 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
15 pages
5.SQL Queries TCL
No ratings yet
5.SQL Queries TCL
21 pages
BOQ Table Definition
No ratings yet
BOQ Table Definition
10 pages
SQL QUERIES
No ratings yet
SQL QUERIES
3 pages
Data Processing SS2
No ratings yet
Data Processing SS2
29 pages
OS Labreport - 1
No ratings yet
OS Labreport - 1
26 pages
Snowpro Advanced: Data Engineer: Exam Study Guide
No ratings yet
Snowpro Advanced: Data Engineer: Exam Study Guide
14 pages
Candidate Profile - Implementation Tips & Tricks
No ratings yet
Candidate Profile - Implementation Tips & Tricks
3 pages
S/N Available Information Resources Agree Strongly Agree Disagree Strongly Disagree
No ratings yet
S/N Available Information Resources Agree Strongly Agree Disagree Strongly Disagree
5 pages
AWS Region:: Primary DB Instance - Supports Read and Write Operations, and Performs All of The Data
No ratings yet
AWS Region:: Primary DB Instance - Supports Read and Write Operations, and Performs All of The Data
16 pages
Download ebooks file Database Management Systems 3rd Edition Raghu Ramakrishnan all chapters
100% (3)
Download ebooks file Database Management Systems 3rd Edition Raghu Ramakrishnan all chapters
65 pages
Data Analyst Roles and Job Descriptions
No ratings yet
Data Analyst Roles and Job Descriptions
3 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
2 pages
Hdfs Commands
No ratings yet
Hdfs Commands
4 pages
Relationship Between Data Categories
No ratings yet
Relationship Between Data Categories
29 pages
Lecture 3b - Review of Conceptual Data Model & Excercises
No ratings yet
Lecture 3b - Review of Conceptual Data Model & Excercises
18 pages
Business Process Model
No ratings yet
Business Process Model
15 pages
Data Base Assignment
No ratings yet
Data Base Assignment
34 pages
Topic 1 - Exam A: Hide Solution Discussion
No ratings yet
Topic 1 - Exam A: Hide Solution Discussion
14 pages
m1s2 Reference
No ratings yet
m1s2 Reference
3 pages
Inv prj1
No ratings yet
Inv prj1
22 pages
Final Dbms
No ratings yet
Final Dbms
10 pages
Easy To Learn Oracle Database and Solve Your Problem
No ratings yet
Easy To Learn Oracle Database and Solve Your Problem
3 pages
DBMS PDF Solutions
No ratings yet
DBMS PDF Solutions
3 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
66 pages

Multimodal RAG Systems Hands-On Guide

Uploaded by

Multimodal RAG Systems Hands-On Guide

Uploaded by

Hands-on Guide

Traditional RAG systems are constrained to text data, making them

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Multimodal data consists of a mixture of text, tables, images,

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Source: A Comprehensive Guide to Building Multimodal RAG Systems

Check out the

You might also like