Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
Eliminate the complexity of managing your data stack and focus on analytics, model building,
and deriving insights from data.
In this tutorial, we will create a personalized Q&A app that can extract information from
using your selected open-source Large Language Models (LLMs). We will cover the be
open-source LLMs, look at some of the best ones available, and demonstrate how to d
LLM-powered applications using Shakudo.
Ecosystems of hugging face, LangChain and Pytorch make open-source models easy t
for specific use cases.
Cost efficiency is another vital benefit of employing open-source LLMs. For small-scal
requests/day), the OpenAI's ChatGPT API is relatively cost-effective at around $1.30/d
use (millions of requests/day), it can quickly rise to $1,300/day. In contrast, open-sourc
NVIDIA A100 cost approximately $4/hour or $96/day.
Lower Latency
For applications where real-time user interaction is crucial, the high latency of GPT-4 c
drawback. When optimized and deployed efficiently, open-source models can offer mu
which makes them more suitable for user interfacing applications.
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 2/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
Experimentation and development are crucial elements in the field of data science. Sha
facilitates the selection of the appropriate computing resources. It provides the flexibi
Jupyter Notebooks, VS Code Server (provided by the platform) or connecting via SSH
local editor.
SBERT : SBERT maps sentences and paragraphs to vectors using a BERT-like mod
when we’re prototyping our application.
Hugging faces MTEB leaderboard compares embedding models on different tasks. Ins
very highly on this list, even better than OpenAI's ADA.
EMB_INSTRUCTOR_XL = "hkunlp/instructor-xl"
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 3/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
EMB_SBERT_MPNET_BASE = "sentence-transformers/all-mpnet-base-v2"
COPY
FlanT5 Models : FlanT5 is text2text generator that is finetuned on several tasks like
answering questions. It uses the encode-decoder architecture of transformers. The
2.0 licensed, which can be used commercially.
FastChatT5 3b Model : It's a FlanT5-based chat model trained by fine tuning FlanT5
ChatGPT. The model is Apache 2.0 licensed.
LLM_FLAN_T5_XXL = "google/flan-t5-xxl"
LLM_FLAN_T5_XL = "google/flan-t5-xl"
LLM_FASTCHAT_T5_XL = "lmsys/fastchat-t5-3b-v1.0"
LLM_FLAN_T5_SMALL = "google/flan-t5-small"
LLM_FLAN_T5_BASE = "google/flan-t5-base"
LLM_FLAN_T5_LARGE = "google/flan-t5-large"
LLM_FALCON_SMALL = "tiiuae/falcon-7b-instruct"
COPY
Let’s go ahead and first set up SBERT for the embedding model and FLANT5-Base fo
model. We chose these models because they can run on an 8 core CPU. FastChat-T5 a
7B require GPU. Loading them is similar and is shown in Codebase:
config = {"persist_directory":None,
"load_in_8bit":False,
"embedding" : EMB_SBERT_MPNET_BASE,
"llm":LLM_FLAN_T5_BASE,
}
COPY
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 4/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
To employ these models, we use Hugging Face pipelines, which simplify the process of
and using them for inference.
The auto device map feature assists in efficiently loading the language model (LLM
memory. If the entire model cannot fit in the GPU memory, some layers are loaded o
memory instead. If the model still cannot fit completely, the remaining weights are s
until needed.
Loading in 8-bit quantizes the LLM and can lower the memory requirements by hal
The creation of the models is governed by the configuration settings and is handled by
create_sbert_mpnet() and create_flan_t5_base() functions, respectively.
def create_sbert_mpnet():
device = "cuda" if torch.cuda.is_available() else "cpu"
return HuggingFaceEmbeddings(model_name=EMB_SBERT_MPNET_BASE, mo
def create_flan_t5_base(load_in_8bit=False):
# Wrap it in HF pipeline for use with LangChain
model="google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model)
return pipeline(
task="text2text-generation",
model=model,
tokenizer = tokenizer,
max_new_tokens=100,
model_kwargs={"device_map": "auto", "load_in_8bit": load_in_
)
if config["embedding"] == EMB_SBERT_MPNET_BASE:
embedding = create_sbert_mpnet()
load_in_8bit = config["load_in_8bit"]
if config["llm"] == LLM_FLAN_T5_BASE:
llm = create_flan_t5_base(load_in_8bit=load_in_8bit)
COPY
If we want to load Falcon, the pipeline would be as below and its task is ”text-generatio
decoder-only model. We need to allow remote code execution because the code come
author’s repository and not from hugging face.
def create_falcon_instruct_small(load_in_8bit=False):
model = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model)
hf_pipeline = pipeline(
task="text-generation",
model = model,
tokenizer = tokenizer,
trust_remote_code = True,
max_new_tokens=100,
model_kwargs={
"device_map": "auto",
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 5/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
"load_in_8bit": load_in_8bit,
"max_length": 512,
"temperature": 0.01,
"torch_dtype":torch.bfloat16,
}
)
return hf_pipeline
COPY
This setup forms the foundation of the knowledge bot's capability to understand and g
to textual input.
persist_directory = config["persist_directory"]
vectordb = Chroma.from_documents(documents=texts, embedding=embedding, p
COPY
hf_llm = HuggingFacePipeline(pipeline=llm)
retriever = vectordb.as_retriever(search_kwargs={"k":4})
qa = RetrievalQA.from_chain_type(llm=hf_llm, chain_type="stuff",retrieve
COPY
Finally, we query the LLM using our question. The PDF knowledge bot will return the r
extracted from the PDF.
COPY
To make the code more organized, we can encapsulate all functionalities into a class.
class PdfQA:
def __init__(self,config:dict = {}):
self.config = config
self.embedding = None
self.vectordb = None
self.llm = None
self.qa = None
self.retriever = None
...
# Check out the full script on the Github link on the intro
COPY
We can now initialize and run the PdfQA class with the following code:
# Initialize PdfQA
pdfqa = PdfQA(config=config)
pdfqa.init_embeddings()
pdfqa.init_models()
# Create Vector DB
pdfqa.vector_db_pdf()
COPY
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 7/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
Shakudo integrates with various tools you can choose to build your front end. For this
web application around our PdfQA class with Streamlit, a Python library that simplifies
import streamlit as st
from pdf_qa import PdfQA
from pathlib import Path
from tempfile import NamedTemporaryFile
import time
import shutil
from constants import * ## constants.py file can be found in code
COPY
Now, let’s set the page configuration and have a session state of the class to avoid inst
multiple times in the same session.
COPY
To load the model and embedding on the GPU or CPU only once across all the client se
the LLM and embedding pipelines.
if llm == LLM_OPENAI_GPT35:
pass
elif llm == LLM_FLAN_T5_SMALL:
return PdfQA.create_flan_t5_small(load_in_8bit)
elif llm == LLM_FLAN_T5_BASE:
return PdfQA.create_flan_t5_base(load_in_8bit)
elif llm == LLM_FLAN_T5_LARGE:
return PdfQA.create_flan_t5_large(load_in_8bit)
elif llm == LLM_FASTCHAT_T5_XL:
return PdfQA.create_fastchat_t5_xl(load_in_8bit)
elif llm == LLM_FALCON_SMALL:
return PdfQA.create_falcon_instruct_small(load_in_8bit)
else:
raise ValueError("Invalid LLM setting")
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 8/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
def load_emb(emb):
if emb == EMB_INSTRUCTOR_XL:
return PdfQA.create_instructor_xl()
elif emb == EMB_SBERT_MPNET_BASE:
return PdfQA.create_sbert_mpnet()
elif emb == EMB_SBERT_MINILM:
pass ##ChromaDB takes care
else:
raise ValueError("Invalid embedding setting")
COPY
Create our Steamlit app sidebar to include radio buttons for model selection and a file
file is submitted, It triggers the model loading and PDF ingestion to create a vector sto
with st.sidebar:
emb = st.radio("**Select Embedding Model**", [EMB_INSTRUCTOR_XL, EMB
llm = st.radio("**Select LLM Model**", [LLM_FASTCHAT_T5_XL, LLM_FLAN
load_in_8bit = st.radio("**Load 8 bit**", [True, False],index=1)
pdf_file = st.file_uploader("**Upload PDF**", type="pdf")
COPY
Add a text input box for the question. Once we submit the question, it triggers the retr
snippets from the vector store and queries the LLM with an appropriate prompt.
if st.button("Answer"):
UPCOMING WEBINAR: "Building LLM Chatbots with Milvus to Leverage Your Internal Knowledge Base"
try:
PLATFORM INTEGRATIONS RESOURCES PARTNERS COMPANY BOOK DEMO
st.session_state["pdf_qa_model"].retreival_qa_chain()
answer = st.session_state["pdf_qa_model"].answer_query(question)
Table of contents st.write(f"{answer}")
except Exception as e:
Why are open-source LLMs becoming
popular in the AI space? st.error(f"Error answering the question: {str(e)}")
Conclusion
SUBSCRIBE
Get Shakudo updates to your inbox for building
better data products.
Finally, our app is ready, and we can deploy it as a service on Shakudo. The platform ma
process easier, allowing you to put your application online quickly.
Deploying applications on Shakudo offers enhanced security and control. Unlike many
Shakudo locks your application behind the SSO or your organization. The services and
models run entirely within your cloud tenancy and on your dedicated Shakudo cluster
the flexibility to avoid vendor lock-in and enabling you to retain control over your appl
the cloud
To deploy your app on Shakudo, we need two key files: pipeline.yaml, which describes
pipeline, and run.sh, a bash script to set up and run our application. Here's what these fi
‘pipeline.yaml’:
pipeline:
name: "QA demo"
tasks:
- name: "QA app"
type: "bash script"
port: 8787
bash_script_path: "LLM/QA_app/run_qa.sh"
COPY
‘run.sh’:
cd "$PROJECT_DIR"
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
export STREAMLIT_RUNONSSAVE=True
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 10/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
COPY
In this script:
Now, our application is live! We can browse through the user interface to see how it wo
Shakudo Services not only simplifies the deployment of your applications but also has
to security. Deploying your models within your Virtual Private Cloud (VPC) is one of the
of hosting models, as it isolates them from the public internet and provides better cont
Conclusion
In this tutorial, we described the advantages of using open-source LLMs over Comme
showed how to integrate OSS LLMs Falcon, FastChat, and FlanT5 to query the interna
with the help of Hugging Face pipelines and LangChain.
Hosting and managing open-source LLMs can be a complex and challenging task. Sha
infrastructure, saving time, resources, and expertise. For a first-hand experience of our
encourage you to reach out to our team and book a demo.
To understand about the practical applications with OpenAI APIs, we recommend read
post about "Building a Confluence Q&A App with LangChain and ChatGPT" where we s
world use case, a chatbot to query your confluence directories.
RESOURCES:
* The code is adapted based on the work in LLM-WikipediaQA, where the author com
Flan-T5 with ChatGPT running a Q&A on Wikipedia Articles.
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 11/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
Continue reading
Twitter Linkedin
Youtube COMPANY
About
NEWSLETTER Partners
Sign up for the latest Shakudo news:
DGX Partner
Email Address Careers
Media Kit
SUBSCRIBE
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 12/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 13/13