Chat with PDFs Using Gen-AI and AWS Bedrock
Chat with PDFs Using Gen-AI and AWS Bedrock
Data Injection: We will load a set of PDF documents into the system, split them into
manageable chunks, and create vector embeddings using the Titan embedding model from
AWS Bedrock.
Vector Store Creation: The vector embeddings will be stored in a vector store, which will
be used for efficient similarity search and retrieval of relevant information.
Language Model Integration: We will integrate the Amazon Titan and Lama 2 language
models from AWS Bedrock to generate the final responses to user queries, leveraging the
retrieved information from the vector store.
Streamlit based User Interface: The application will be built using the Streamlit
framework, providing a user-friendly interface for interacting with the document Q&A
system.
Step-1 First we will get our model access of bedrock & then we will configure aws bedrock
service as per region. So first go to Aws Bedrock
After going to Bedrock, Go to model access & click on Modify Model Access.
As we need embedding model & chat application model, we will go with Amazon Titan
Model and Llama3 model.
Important thing is, Configure AWS CLI in the region in which you have model access.
Now let’s create a virtual environment and install the required libraries
Now we need to install required library for the Project which are as :
After Install these libraries, Now create a folder name Data to store your PDF in same place where
you have created venv. And after that copy the below app.py file code :
So, this is how everything should be aligned , now let’s understand the code part :
Load PDFs: We use the PyPDF library to read all PDF files from a folder named data.
Split the Text: Since PDFs can be lengthy, we break the text into smaller parts using a tool called a
"Text Splitter." This ensures the system can process and analyse manageable chunks of data.
Result: The system now has smaller, easy-to-handle text segments ready for further processing.
FAISS Vector Store: We use FAISS, a tool that creates a special database for finding similar content.
Think of it like a search engine for your PDF data.
AI-Powered Embeddings: Using the Titan model from AWS Bedrock, we convert text chunks into
"embeddings" (numerical representations of meaning). These embeddings allow the system to
identify and rank the most relevant text when answering questions.
Result: A powerful, AI-enabled knowledge base that can quickly locate relevant information in your
PDFs.
Here’s the code for the `get_vector_store` function:
Amazon Titan : This model is great for mid-sized text processing and efficient answers.
Llama 2: A highly capable AI model for detailed and conversational responses.
How It Works: These models analyze the question, retrieve the relevant chunks of text from the
knowledge base, and then generate accurate, meaningful responses.
Result: The system now understands user questions and generates natural, helpful answers.
Generate a Response: The selected text chunks are passed to the AI model (Titan or Llama 2). The
model crafts a precise response based on the retrieved content.
Example: Ask “What’s mentioned about X in the PDF?” and the system will find the most relevant
section and answer it in simple language.
Sidebar Option: Update or create the knowledge base with the latest PDFs.
Question Box: Users can type their questions directly.
AI Model Buttons: Choose between Titan or Llama 2 for generating responses.
How It Works:
Load PDFs into the knowledge base via the sidebar button.
Command :