0% found this document useful (0 votes)
699 views

Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo

This document provides a step-by-step guide to building a PDF knowledge bot using open-source large language models (LLMs). It discusses the benefits of open-source LLMs like customization, cost efficiency, and data security. Popular text generation and embedding models are listed, including Falcon-Instruct, Guanaco, and MTEB models from Microsoft. The guide then outlines building a PDF chatbot on Shakudo that extracts text from PDFs, encodes it with an embedding model, finds similar snippets with vector search, and generates answers using an LLM.

Uploaded by

Pocho Ortiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
699 views

Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo

This document provides a step-by-step guide to building a PDF knowledge bot using open-source large language models (LLMs). It discusses the benefits of open-source LLMs like customization, cost efficiency, and data security. Popular text generation and embedding models are listed, including Falcon-Instruct, Guanaco, and MTEB models from Microsoft. The guide then outlines building a PDF chatbot on Shakudo that extracts text from PDFs, encodes it with an embedding model, finds similar snippets with vector search, and generates answers using an LLM.

Uploaded by

Pocho Ortiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

Building a PDF Knowledge Bot With Open-


Source LLMs - A Step-by-Step Guide
AUTHOR(S): Sabrina Aquino Sai Kalyan Siddanatham

UPDATED ON: June 19, 2023 in Tutorials

Eliminate the complexity of managing your data stack and focus on analytics, model building,
and deriving insights from data.

In this tutorial, we will create a personalized Q&A app that can extract information from
using your selected open-source Large Language Models (LLMs). We will cover the be
open-source LLMs, look at some of the best ones available, and demonstrate how to d
LLM-powered applications using Shakudo.

If you want to skip directly to code, we’ve made it available on GitHub!

Why are open-source LLMs becoming pop


the AI space?
Let's start by understanding why developers increasingly prefer open-source LLMs o
offerings, like OpenAI's APIs.

Customization and optimization


Open-source LLMs are highly adaptable. They allow users to modify and optimize the
their needs. This flexibility enables the LLMs to understand and process unique data e
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 1/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

Ecosystems of hugging face, LangChain and Pytorch make open-source models easy t
for specific use cases.

Autonomy and cost efficiency


Adopting open-source LLMs significantly reduces dependency on large AI providers,
freedom to select your preferred technology stack. This autonomy minimizes issues re
lock-in and fosters an environment of collaboration within the developer community.

Cost efficiency is another vital benefit of employing open-source LLMs. For small-scal
requests/day), the OpenAI's ChatGPT API is relatively cost-effective at around $1.30/d
use (millions of requests/day), it can quickly rise to $1,300/day. In contrast, open-sourc
NVIDIA A100 cost approximately $4/hour or $96/day.

Enhanced data security


Open-source LLMs provide better data privacy and security. Unlike third-party AI serv
allow you to maintain complete control over your data, which minimizes the risk of data
offers an enterprise license that allows businesses to use and fine-tune their LLMs. Th
businesses address data privacy concerns by allowing them to train the models on the
enterprise license is expensive and requires a significant amount of technical expertise

Fine-tuning can be time-consuming and expensive. It can also be difficult to ensure th


biased or harmful. Open-source LLMs still offer the best data privacy and security, allo
completely control their data and training process.

Lower Latency
For applications where real-time user interaction is crucial, the high latency of GPT-4 c
drawback. When optimized and deployed efficiently, open-source models can offer mu
which makes them more suitable for user interfacing applications.

Innovation and Contribution to AI Development


Open-source LLMs enable companies and developers to contribute to the future of A
control the model's architecture, training data, and training process promotes experim
techniques and strategies. It allows you to stay updated with the latest developments
contribute to the AI community by sharing your models and techniques.

Top open-source LLMs (June 2023)


When it comes to open-source LLMs, there's a variety to choose from, including top o
Instruct and Guanaco-65b. OpenLLM Leaderboard compares text-generative LLMs o
benchmarks.

TEXT GENERATION MODELS:

Model OpenLLM Avg Score Developer License

Falcon-Instruct (40B) 63.2 Technology Innovation I… Apache 2.0

Guanaco (65b) 62.2 timdettmers LLama Based License

Llama (65b) 58.3 Facebook LLama Based License

Falcon-Instruct (7B) 48.8 Technology Innovation I… Apache 2.0

MPT-Instruct (7B) 48.6 Mosaic ML CC-By-SA-3.0

vicuna(13b) 32.3 lmsys LLama Based License

Fastchat-T5 (3B) - lmsys Apache 2.0

Flan-T5 XXL - Google Apache 2.0

TEXT EMBEDDING MODELS:

MTEB leaderboard similarly compares text-embedding models on different tasks.

Model MTEB Average Sco… Developer License

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 2/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

e5-large-v2 (0.3B) 62.25 Microsoft MIT

instructor-xl (1.3B) 61.79 NLP Group of The Unive… Apache 2.0

instructor-large 61.59 NLP Group of The Unive… Apache 2.0

sentence-Bert (0.1B) 57.78 Nils Reimers and team Apache 2.0

Building a PDF Knowledge Bot with Open-


LLMs on Shakudo
Solution Overview:
For any textual knowledge base (in our case, PDFs), we first need to extract text snippe
knowledge base and use an embedding model to create a vector store representing th
of the snippets. When a question is asked, we estimate its embedding and find relevan
efficient similarity search from vector stores. After extracting the snippets, we enginee
generate an answer using the LLM generation model. The prompt can be tuned based
used.

Experimentation and development are crucial elements in the field of data science. Sha
facilitates the selection of the appropriate computing resources. It provides the flexibi
Jupyter Notebooks, VS Code Server (provided by the platform) or connecting via SSH
local editor.

Overview of pdf chatbot llm solution

Step 0: Loading LLM Embedding Models and Generative Models


We begin by setting up the models and embeddings that the knowledge bot will use, w
interpreting and processing the text data within the PDFs.

LLM EMBEDDING MODELS

We use the following Open Source models in the codebase:

INSTRUCTOR XL : Instructor xl is an instruction-finetuned text embedding model th


embeddings tailored for any task instruction. The instruction for embedding text sn
the document for retrieval:". The instruction for embedding user questions is "Repr
for retrieving supporting documents:"

SBERT : SBERT maps sentences and paragraphs to vectors using a BERT-like mod
when we’re prototyping our application.

Hugging faces MTEB leaderboard compares embedding models on different tasks. Ins
very highly on this list, even better than OpenAI's ADA.

EMB_INSTRUCTOR_XL = "hkunlp/instructor-xl"

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 3/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

EMB_SBERT_MPNET_BASE = "sentence-transformers/all-mpnet-base-v2"

COPY

LLM GENERATION MODELS

Open source models used in the codebase are

FlanT5 Models : FlanT5 is text2text generator that is finetuned on several tasks like
answering questions. It uses the encode-decoder architecture of transformers. The
2.0 licensed, which can be used commercially.

FastChatT5 3b Model : It's a FlanT5-based chat model trained by fine tuning FlanT5
ChatGPT. The model is Apache 2.0 licensed.

Falcon7b Model : Falcon7b is a smaller version of Falcon-40b, which is a text genera


only model). Falcon-40B is currently the best open-source model on the OpenLLM
major reason for its high performance is its training with high-quality data.

Open source models used in the codebase

There are other high-performing open-source models (MPT-7B, StableLM, RedPajama


OpenLLM Leaderboard, which can be easily integrated with hugging face pipelines.

LLM_FLAN_T5_XXL = "google/flan-t5-xxl"
LLM_FLAN_T5_XL = "google/flan-t5-xl"
LLM_FASTCHAT_T5_XL = "lmsys/fastchat-t5-3b-v1.0"
LLM_FLAN_T5_SMALL = "google/flan-t5-small"
LLM_FLAN_T5_BASE = "google/flan-t5-base"
LLM_FLAN_T5_LARGE = "google/flan-t5-large"
LLM_FALCON_SMALL = "tiiuae/falcon-7b-instruct"

COPY

Let’s go ahead and first set up SBERT for the embedding model and FLANT5-Base fo
model. We chose these models because they can run on an 8 core CPU. FastChat-T5 a
7B require GPU. Loading them is similar and is shown in Codebase:

config = {"persist_directory":None,
"load_in_8bit":False,
"embedding" : EMB_SBERT_MPNET_BASE,
"llm":LLM_FLAN_T5_BASE,
}

COPY

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 4/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

To employ these models, we use Hugging Face pipelines, which simplify the process of
and using them for inference.

For encoder-decoder models like FlanT5, the pipeline’s task is ”text2text-generatio

The auto device map feature assists in efficiently loading the language model (LLM
memory. If the entire model cannot fit in the GPU memory, some layers are loaded o
memory instead. If the model still cannot fit completely, the remaining weights are s
until needed.

Loading in 8-bit quantizes the LLM and can lower the memory requirements by hal

The creation of the models is governed by the configuration settings and is handled by
create_sbert_mpnet() and create_flan_t5_base() functions, respectively.

def create_sbert_mpnet():
device = "cuda" if torch.cuda.is_available() else "cpu"
return HuggingFaceEmbeddings(model_name=EMB_SBERT_MPNET_BASE, mo

def create_flan_t5_base(load_in_8bit=False):
# Wrap it in HF pipeline for use with LangChain
model="google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model)
return pipeline(
task="text2text-generation",
model=model,
tokenizer = tokenizer,
max_new_tokens=100,
model_kwargs={"device_map": "auto", "load_in_8bit": load_in_
)

if config["embedding"] == EMB_SBERT_MPNET_BASE:
embedding = create_sbert_mpnet()
load_in_8bit = config["load_in_8bit"]
if config["llm"] == LLM_FLAN_T5_BASE:
llm = create_flan_t5_base(load_in_8bit=load_in_8bit)

COPY

If we want to load Falcon, the pipeline would be as below and its task is ”text-generatio
decoder-only model. We need to allow remote code execution because the code come
author’s repository and not from hugging face.

def create_falcon_instruct_small(load_in_8bit=False):
model = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
hf_pipeline = pipeline(
task="text-generation",
model = model,
tokenizer = tokenizer,
trust_remote_code = True,
max_new_tokens=100,
model_kwargs={
"device_map": "auto",

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 5/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

"load_in_8bit": load_in_8bit,
"max_length": 512,
"temperature": 0.01,
"torch_dtype":torch.bfloat16,
}
)
return hf_pipeline

COPY

This setup forms the foundation of the knowledge bot's capability to understand and g
to textual input.

Step 1: Ingesting the Data into Vector Store (ChromaDB)


In this step, let’s load our PDF and split it into manageable text snippets.

# Load the pdf


pdf_path = "wiki_data_short.pdf"
loader = PDFPlumberLoader(pdf_path)
documents = loader.load()

# Split documents and create text snippets


text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=10, enc
texts = text_splitter.split_documents(texts)

persist_directory = config["persist_directory"]
vectordb = Chroma.from_documents(documents=texts, embedding=embedding, p

COPY

Step 2: Retrieving Snippets and Prompt Engineering


Now, we retrieve relevant snippets based on question embeddings and then construct
the LLM.

hf_llm = HuggingFacePipeline(pipeline=llm)
retriever = vectordb.as_retriever(search_kwargs={"k":4})
qa = RetrievalQA.from_chain_type(llm=hf_llm, chain_type="stuff",retrieve

# Defining a default prompt for flan models


if config["llm"] == LLM_FLAN_T5_SMALL or config["llm"] == LLM_FLAN_T5_BA
question_t5_template = """
context: {context}
question: {question}
answer:
"""
QUESTION_T5_PROMPT = PromptTemplate(
template=question_t5_template, input_variables=["context", "ques
)
qa.combine_documents_chain.llm_chain.prompt = QUESTION_T5_PROMPT

COPY

Step 3: Querying the LLM


https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 6/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

Finally, we query the LLM using our question. The PDF knowledge bot will return the r
extracted from the PDF.

question = "what's the reason for financial crisis?"


qa.combine_documents_chain.verbose = True
qa.return_source_documents = True
qa({"query":question,})

COPY

PACKAGING INTO A CLASS

To make the code more organized, we can encapsulate all functionalities into a class.

class PdfQA:
def __init__(self,config:dict = {}):
self.config = config
self.embedding = None
self.vectordb = None
self.llm = None
self.qa = None
self.retriever = None

...
# Check out the full script on the Github link on the intro

COPY

We can now initialize and run the PdfQA class with the following code:

# Configuration for PdfQA


config = {"persist_directory":None,
"load_in_8bit":False,
"embedding" : EMB_SBERT_MPNET_BASE,
"llm":LLM_FLAN_T5_BASE,
"pdf_path":"wiki_data_short.pdf"
}

# Initialize PdfQA
pdfqa = PdfQA(config=config)
pdfqa.init_embeddings()
pdfqa.init_models()

# Create Vector DB
pdfqa.vector_db_pdf()

# Set up Retrieval QA Chain


pdfqa.retreival_qa_chain()

# Query the model


question = "what the reason for financial crisis?"
pdfqa.answer_query(question)

COPY

Step 4: Building the Streamlit app

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 7/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

Shakudo integrates with various tools you can choose to build your front end. For this
web application around our PdfQA class with Streamlit, a Python library that simplifies

Below is the code breakdown:

We start by importing the necessary modules

import streamlit as st
from pdf_qa import PdfQA
from pathlib import Path
from tempfile import NamedTemporaryFile
import time
import shutil
from constants import * ## constants.py file can be found in code

COPY

Now, let’s set the page configuration and have a session state of the class to avoid inst
multiple times in the same session.

# Streamlit app code


st.set_page_config(
page_title='Q&A Bot for PDF',
page_icon='🔖',
layout='wide',
initial_sidebar_state='auto',
)

if "pdf_qa_model" not in st.session_state:


st.session_state["pdf_qa_model"]:PdfQA = PdfQA() ## Intialisation

COPY

To load the model and embedding on the GPU or CPU only once across all the client se
the LLM and embedding pipelines.

## To cache resource across multiple session


@st.cache_resource
def load_llm(llm,load_in_8bit):

if llm == LLM_OPENAI_GPT35:
pass
elif llm == LLM_FLAN_T5_SMALL:
return PdfQA.create_flan_t5_small(load_in_8bit)
elif llm == LLM_FLAN_T5_BASE:
return PdfQA.create_flan_t5_base(load_in_8bit)
elif llm == LLM_FLAN_T5_LARGE:
return PdfQA.create_flan_t5_large(load_in_8bit)
elif llm == LLM_FASTCHAT_T5_XL:
return PdfQA.create_fastchat_t5_xl(load_in_8bit)
elif llm == LLM_FALCON_SMALL:
return PdfQA.create_falcon_instruct_small(load_in_8bit)
else:
raise ValueError("Invalid LLM setting")

## To cache resource across multiple session


@st.cache_resource

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 8/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

def load_emb(emb):
if emb == EMB_INSTRUCTOR_XL:
return PdfQA.create_instructor_xl()
elif emb == EMB_SBERT_MPNET_BASE:
return PdfQA.create_sbert_mpnet()
elif emb == EMB_SBERT_MINILM:
pass ##ChromaDB takes care
else:
raise ValueError("Invalid embedding setting")

COPY

Create our Steamlit app sidebar to include radio buttons for model selection and a file
file is submitted, It triggers the model loading and PDF ingestion to create a vector sto

with st.sidebar:
emb = st.radio("**Select Embedding Model**", [EMB_INSTRUCTOR_XL, EMB
llm = st.radio("**Select LLM Model**", [LLM_FASTCHAT_T5_XL, LLM_FLAN
load_in_8bit = st.radio("**Load 8 bit**", [True, False],index=1)
pdf_file = st.file_uploader("**Upload PDF**", type="pdf")

if st.button("Submit") and pdf_file is not None:


with st.spinner(text="Uploading PDF and Generating Embeddings.."
with NamedTemporaryFile(delete=False, suffix='.pdf') as tmp:
shutil.copyfileobj(pdf_file, tmp)
tmp_path = Path(tmp.name)
st.session_state["pdf_qa_model"].config = {
"pdf_path": str(tmp_path),
"embedding": emb,
"llm": llm,
"load_in_8bit": load_in_8bit
}
st.session_state["pdf_qa_model"].embedding = load_emb(em
st.session_state["pdf_qa_model"].llm = load_llm(llm,load
st.session_state["pdf_qa_model"].init_embeddings()
st.session_state["pdf_qa_model"].init_models()
st.session_state["pdf_qa_model"].vector_db_pdf()
st.sidebar.success("PDF uploaded successfully")

COPY

Add a text input box for the question. Once we submit the question, it triggers the retr
snippets from the vector store and queries the LLM with an appropriate prompt.

question = st.text_input('Ask a question', 'What is this document?')

if st.button("Answer"):
UPCOMING WEBINAR: "Building LLM Chatbots with Milvus to Leverage Your Internal Knowledge Base"
try:
PLATFORM INTEGRATIONS RESOURCES PARTNERS COMPANY BOOK DEMO
st.session_state["pdf_qa_model"].retreival_qa_chain()
answer = st.session_state["pdf_qa_model"].answer_query(question)
Table of contents st.write(f"{answer}")
except Exception as e:
Why are open-source LLMs becoming
popular in the AI space? st.error(f"Error answering the question: {str(e)}")

Building a PDF Knowledge Bot with COPY


Open-source LLMs on Shakudo
Solution Overview:
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 9/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo
Solution Overview:
This user interface allows the user to upload a PDF file, choose the model to use and a
Step 0: Loading LLM Embedding Models
and Generative Models
Step 5: Deploying with Shakudo
Step 1: Ingesting the Data into Vector Sto
(ChromaDB) Finally, our app is ready, and we can deploy it as a service on Shakudo. The platform ma
Step 2: Retrieving Snippets and Prompt process easier, allowing you to put your application online quickly.
Engineering
Step 3: Querying the LLM
Step 4: Building the Streamlit app

Step 5: Deploying with Shakudo

Conclusion

SUBSCRIBE
Get Shakudo updates to your inbox for building
better data products.

Email Address >

High-level diagram of deploying llm app on Shakudo

Finally, our app is ready, and we can deploy it as a service on Shakudo. The platform ma
process easier, allowing you to put your application online quickly.

Deploying applications on Shakudo offers enhanced security and control. Unlike many
Shakudo locks your application behind the SSO or your organization. The services and
models run entirely within your cloud tenancy and on your dedicated Shakudo cluster
the flexibility to avoid vendor lock-in and enabling you to retain control over your appl
the cloud

To deploy your app on Shakudo, we need two key files: pipeline.yaml, which describes
pipeline, and run.sh, a bash script to set up and run our application. Here's what these fi

‘pipeline.yaml’:

pipeline:
name: "QA demo"
tasks:
- name: "QA app"
type: "bash script"
port: 8787
bash_script_path: "LLM/QA_app/run_qa.sh"

COPY

‘run.sh’:

PROJECT_DIR="$(cd -P "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

cd "$PROJECT_DIR"

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
export STREAMLIT_RUNONSSAVE=True
https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 10/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

pip install -r requirements.txt

streamlit run streamlit_app_blog.py --server.port 8787 --browser.serverA

COPY

In this script:

Set the project directory and navigate into it.

Install the necessary Python libraries from the requirements.txt file.

Run the Streamlit app on port 8787.

Now, our application is live! We can browse through the user interface to see how it wo

LLM app Streamlit UI

Shakudo Services not only simplifies the deployment of your applications but also has
to security. Deploying your models within your Virtual Private Cloud (VPC) is one of the
of hosting models, as it isolates them from the public internet and provides better cont

Conclusion
In this tutorial, we described the advantages of using open-source LLMs over Comme
showed how to integrate OSS LLMs Falcon, FastChat, and FlanT5 to query the interna
with the help of Hugging Face pipelines and LangChain.

Hosting and managing open-source LLMs can be a complex and challenging task. Sha
infrastructure, saving time, resources, and expertise. For a first-hand experience of our
encourage you to reach out to our team and book a demo.

To understand about the practical applications with OpenAI APIs, we recommend read
post about "Building a Confluence Q&A App with LangChain and ChatGPT" where we s
world use case, a chatbot to query your confluence directories.

RESOURCES:

* The code is adapted based on the work in LLM-WikipediaQA, where the author com
Flan-T5 with ChatGPT running a Q&A on Wikipedia Articles.

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 11/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

Ensure Compatibility Across Your Data Stack


Chat with one of our experts to answer your questions about your data stack,
data tools you need, and deploying Shakudo on your cloud.

LEARN MORE CHAT WITH US

Continue reading

How to Manage your How to Build a Flexible, Shakudo Celebrates


Internal Knowledge Unified Data Stack in Forbes 30 Under 30
Base with Milvus 2023 — No Vendor and C100 Fellowship
Lock-in or
Infrastructure
Product September 7, 2023 Maintenance Required News August 15, 2023

Product August 29, 2023

PRODUCT GET STARTED RESOURCES


Platform Signup Blog
Shakudo creates compatibility across the
best-of-breed data tools for a more Stack Components Contact Us Solutions
reliable, performant, and cost effective Newsletter Webinar
data stack than ever before.
Documentation

Book Demo Email

Twitter Linkedin

Youtube COMPANY
About
NEWSLETTER Partners
Sign up for the latest Shakudo news:
DGX Partner
Email Address Careers

Media Kit
SUBSCRIBE

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 12/13
22/9/23, 12:05 Building a PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide | Shakudo

© 2022 Shakudo Toronto, CA Contact us Privacy Policy Terms/Conditions Sitemap

https://www.shakudo.io/blog/build-pdf-bot-open-source-llms 13/13

You might also like