0% found this document useful (0 votes)
7 views12 pages

Chat with PDFs Using Gen-AI and AWS Bedrock

This document outlines a process for creating a document Q&A system that allows users to interact with uploaded PDFs using AWS Bedrock's language models, specifically Amazon Titan and Llama 2. The workflow includes uploading PDFs, extracting text, creating vector embeddings, and generating responses to user queries through a Streamlit interface. Key steps involve data ingestion, vector store creation, language model integration, and user interface development to facilitate efficient information retrieval from the PDFs.

Uploaded by

Soumyajit Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

Chat with PDFs Using Gen-AI and AWS Bedrock

This document outlines a process for creating a document Q&A system that allows users to interact with uploaded PDFs using AWS Bedrock's language models, specifically Amazon Titan and Llama 2. The workflow includes uploading PDFs, extracting text, creating vector embeddings, and generating responses to user queries through a Streamlit interface. Key steps involve data ingestion, vector store creation, language model integration, and user interface development to facilitate efficient information retrieval from the PDFs.

Uploaded by

Soumyajit Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Unlocking Gen-AI for Document Management: Chat with

PDFs Using AWS Bedrock


__________________________________________________
In this blog, I’ll show you how to chat with your uploaded PDFs using Gen-AI. We will use
AWS Bedrock’s language models, like Amazon Titan and Llama 2, to create a document
Q&A system that can easily extract and summarize information from PDF files.

The workflow will be as follows:

1. Upload PDF Documents


2. Extract Text from PDFs
3. Convert Text to Vectors & Store Them
4. Ask Questions & Retrieve Relevant Vectors
5. Process Data using AWS Bedrock
6. Display Results

The project consists of the following key steps:

Data Injection: We will load a set of PDF documents into the system, split them into
manageable chunks, and create vector embeddings using the Titan embedding model from
AWS Bedrock.

Vector Store Creation: The vector embeddings will be stored in a vector store, which will
be used for efficient similarity search and retrieval of relevant information.
Language Model Integration: We will integrate the Amazon Titan and Lama 2 language
models from AWS Bedrock to generate the final responses to user queries, leveraging the
retrieved information from the vector store.

Streamlit based User Interface: The application will be built using the Streamlit
framework, providing a user-friendly interface for interacting with the document Q&A
system.

Step-1 First we will get our model access of bedrock & then we will configure aws bedrock
service as per region. So first go to Aws Bedrock

After going to Bedrock, Go to model access & click on Modify Model Access.

As we need embedding model & chat application model, we will go with Amazon Titan
Model and Llama3 model.
Important thing is, Configure AWS CLI in the region in which you have model access.

Now let’s create a virtual environment and install the required libraries

Creating a virtual environment :

Verify Your Current environment :


You're inside a Python venv, as indicated by the (venv) in your shell prompt. This means Python and
pip from this environment are active.

Now we need to install required library for the Project which are as :

Command to install the libraries:

After Install these libraries, Now create a folder name Data to store your PDF in same place where
you have created venv. And after that copy the below app.py file code :
So, this is how everything should be aligned , now let’s understand the code part :

Step 1: Load and Prepare PDF Data / Data Ingestion


The first step is to process your PDFs so the system can work with the content effectively. Here’s
how it works:

Load PDFs: We use the PyPDF library to read all PDF files from a folder named data.
Split the Text: Since PDFs can be lengthy, we break the text into smaller parts using a tool called a
"Text Splitter." This ensures the system can process and analyse manageable chunks of data.

Result: The system now has smaller, easy-to-handle text segments ready for further processing.

Here’s the code for the `data_ingestion` function:

Step 2: Create a Searchable Knowledge Base / Vector Store Creation


Next, we transform the prepared data into a format that allows for fast and accurate searching:

FAISS Vector Store: We use FAISS, a tool that creates a special database for finding similar content.
Think of it like a search engine for your PDF data.

AI-Powered Embeddings: Using the Titan model from AWS Bedrock, we convert text chunks into
"embeddings" (numerical representations of meaning). These embeddings allow the system to
identify and rank the most relevant text when answering questions.

Result: A powerful, AI-enabled knowledge base that can quickly locate relevant information in your
PDFs.
Here’s the code for the `get_vector_store` function:

Step 3: Connect AI Language Models / Language Model Integration


We add intelligence to the system by integrating advanced AI models to generate responses:

Amazon Titan : This model is great for mid-sized text processing and efficient answers.
Llama 2: A highly capable AI model for detailed and conversational responses.
How It Works: These models analyze the question, retrieve the relevant chunks of text from the
knowledge base, and then generate accurate, meaningful responses.

Result: The system now understands user questions and generates natural, helpful answers.

Here’s the code for the `get_titan_llm` and `get_lama2_llm` functions:

Step 4: Retrieve Information and Generate Responses


Here’s what happens when a question is asked:
Search the Knowledge Base: The system looks for the most relevant text chunks using the FAISS
vector store.

Generate a Response: The selected text chunks are passed to the AI model (Titan or Llama 2). The
model crafts a precise response based on the retrieved content.

Example: Ask “What’s mentioned about X in the PDF?” and the system will find the most relevant
section and answer it in simple language.

Here’s the code for the `get_response_llm` function:

Step 5: Create a User-Friendly Interface / Streamlit based User Interface


To make the system easy to use, we build a web app using Streamlit:

Sidebar Option: Update or create the knowledge base with the latest PDFs.
Question Box: Users can type their questions directly.
AI Model Buttons: Choose between Titan or Llama 2 for generating responses.
How It Works:
Load PDFs into the knowledge base via the sidebar button.

Enter a question, like “Summarize document Y.”

Click on the preferred AI model (Titan or Llama 2) to get a response instantly.


Here’s the code for the main Streamlit application:

So that’s all. Let’s Run our application. For that run :

Command :

streamlit run app.py

You might also like