SlideShare a Scribd company logo
Leveraging the Power of ChatGPT
and Vector Databases in the
FreeBSD Expert System
Yan-Hao Wang, AsiaBSDCon 2024
Who Am I
My name is Yan-Hao Wang, a senior high student in Taiwan and FreeBSD
Taiwan intern since 2022.
I've been involved in various tasks such as
1. Developing an online document/man-page editor.
2. Crafting tests for command utilities like gunion(8) and printenv(1).
3. Translating FreeBSD documents.
GitHub Repository
All codes have been uploaded to the freebsd_data repository. The slide will also
be put on it. If you're interested, you can access them there.
Outline
1. Introduction of the Expert System
2. Introduction of ChatGPT
3. Development Process
a. Data Cleaning and Extraction
b. Embedded Model and Vector Database
c. Integration with ChatGPT
4. OpenAI GPTs as Potential Replacements
5. Summary
Expert System
Expert system is a system that can answer user questions accurately in a specific
domain. It consists of two parts
1. Knowledge Base: stores all the relevant information related to the domain of
expertise.
2. Rule Engine: Contain some predefined rules by the data scientist. It processes
the user's questions and applies rules to generate accurate responses.
Expert System
Modern expert systems use machine learning to simulate the behavior or
judgment of domain experts.
ML model
ChatGPT
ChatGPT (Chat Generative Pre-trained Transformer) is a chatbot developed by
OpenAI and launched on November 30, 2022. Based on a large language model
(LLM).
FreeBSD Expert System
There are multiple ways to build a FreeBSD expert system.
1. Train a new ML model with FreeBSD data.
No. I am not an ML expert and it costs a lot.
FreeBSD Expert System
There are multiple ways to build a FreeBSD expert system.
1. Train a new ML model with FreeBSD data.
No. I am not an ML expert and it costs a lot.
2. Use the existing model such as ChatGPT.
But we definitely won’t call ChatGPT a FreeBSD expert system.
FreeBSD Expert System
ChatGPT uses amount of data for training. So he can answer problems in every
domain though may not be correct. It's more like a general-purpose system.
FreeBSD Expert System
ChatGPT uses amount of data for training. So he can answer problems in every
domain though may not be correct. It's more like a general-purpose system.
The limitation of why ChatGPT can’t be called a FreeBSD expert system
1. Chatgpt may tendency to hallucinate answers when asked about unfamiliar
domains.
2. The data is not new enough (ChatGPT uses data before 2021 to train). So he
can’t answer the newest question.
FreeBSD Expert System
There are two ways to handle the limitation.
1. Fine-tune. fine-tuning is a process that takes a model that has already been
trained for one given task and then tunes or tweaks the model to make it
perform a second similar task.
FreeBSD Expert System
There are two ways to handle the limitation.
First way is, fine-tune. fine-tuning is a process that takes a model that has already
been trained for one given task and then tunes or tweaks the model to make it
perform a similar task.
OpenAI has supplied this API. For the open-source model, you should use Pytorch
and TensorFlow to handle it.
FreeBSD Expert System
However, fine-tuning is still hard for AI-unfamiliar developer. And It also cost a lot.
The second way is Retrieval Augmented Generation (RAG). Basically, it is just like
when you use ChatGPT, you can provide related info about your question, and it
can provide a much more accurate response.
This is an acceptable way, so we will use the embedded model and vector
database to achieve this.
Embedded Model
It is a type of ML model used to convert input data, such as words or sentences,
into numerical representations called embedding vector or vector.
These embeddings capture the semantic meaning or context of the input data in a
continuous vector space. It can work on tasks such as text classification and
sentiment analysis.
Vector Database
Vector databases are designed to store vectors efficiently. These databases
employ various search algorithms to find the most similar vectors, such as t.
Numerous open-source vector databases are available to choose from.
Development Process
Development - Architecture
Development - Data Extraction
Use the simple find command to extract data. The data sources are very different,
we need to convert it to plain text. We use “hs-pandoc” package to convert data.
Development - Data Cleaning
Remove unrelated info, simple find command to remove the unrelated data.
Unrelated text
Development - Data Cleaning
Actually, data cleaning is the most time-consuming step. Data scientists spend
60% of their time cleaning data rather than creating insights.
There are some tools that can help us clean the data.
OpenRefine
Development - Embedded Model
OpenAI has embedded model API, there are multiple open source embedded
models online too. In this project, we use the open source model (“gte-base”).
MTEB Leaderboard - From Hugging Face
OpenAI embedding model
Development - Embedded Model
[0.2, 0.3 … 2.3]
[0.3, 0.6 … 1.7]
[0.9, 0.1 … 3.1]
vector 1
vector 2
vector 3
Development - Embedded Model
We use “gte-base” as our model. Its model size is only 0.22 GB which my small
GPU (NVIDIA GeForce GTX 1050 Ti) can handle it.
It takes only 590MB of GPU memory and 67 minutes to embed all the documents.
Development - Embedded Model
Development - Embedded Model
Development - Embedded Model
There are multiple facts (hyperparameters) we can tune here. For example
1. The length of sentences.
2. What metadata should we leave?
3. What model should we use? Weather we need to tune the embedded model.
All these hyperparameters should be tried multiple times to get the best answer.
The answer will be different with different fields - NFL(No Free Lunch Theorems)
Development - Vector database
As previously said, we have different vector databases.
But in our local test, we just use a file to store the vector and a simple cosine
similarity algorithm. Because our data is not big (< 100 MB).
Development - Query
Question: How to use the gunion command in FreeBSD?
Query result:
1. Man page of gunion
2. Man page of gunion
3. FreeBSD status report (A New GEOM Facility, gunion)
4. Unrelated info …
Development - Query
TOP1
TOP2
TOP3
Development - Integration with ChatGPT
So we need to host an embedded model and vector database and have an open
API to let users use. Then integrate the API with ChatGPT
1. The first way is easy, we just write a Python code to use ChatGPT API and
our API. But this is not friendly to normal users.
Development - Integration with ChatGPT
So we need to host an embedded model and vector database and have an open
API to let users use. Then integrate the API with ChatGPT
1. The first way is easy, we just write a Python code to use ChatGPT API and
our API. But this is not friendly to normal users.
2. Develop ChatGPT plugin, ChatGPT plugin can let us set some API. While
asking questions ChatGPT, it will call the API and get the response.
This is the best practice of our project, the user just needs to enable the
plugin in ChatGPT.
Development - Integration with ChatGPT
OpenAI GPTs as Potential Replacements
GPTs was lauched at November 2023. It provides an easy way to generate a
custom GPT for any data you have. Which becomes a potential replacement for
our project. We only need to upload the data from step 1 and there is a custom
expert system.
On March 19, 2024, you will no longer be able to install new plugins or create new
conversations with existing plugins.
Wiki Future Audiences
The idea is inspired by Wiki. They actually already have developed a plugin. But
they also stopped the plan after the GPTs release.
This timing also coincides with OpenAI’s move away from the plugin marketplace
for ChatGPT, and towards no/low-code customizable GPTs. This shift has made
our plugin in its current form inaccessible to new users and largely redundant.
While we could repurpose this functionality towards being a GPT, we don’t believe
we would learn significantly more beyond how to create a product within the
OpenAI ecosystem.
Lessons learned, ChatGPT has not become the new information seeking
paradigm (yet?).
Summary
Solution RAG GPTs (Custom GPT) ChatGPT Plus (browse internet)
Cost Medium ~ Hard Small Small
Advantage ● Privacy
● Flexibility
● Fast ● Fast
Disadvantage ● Cost ● Privacy
● Flexibility
● Data source are different
● Flexibility
Summary
The significance of LLM is poised to exponentially increase in the future, marking
a pivotal shift in our technological landscape.
While we may not complete the production process in its entirety. But it is a good
thing to focus on any future trends and try to combine them with FreeBSD.
Reference
● What is an Expert System?
● Do data scientists spend 80% of their time cleaning data? Turns out, no?
● Wiki, Talk:Future Audiences
Ad

More Related Content

Similar to wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expert-System-slides.pdf (20)

ChatGPT usage in software development - curse or boon.pdf
ChatGPT usage in software development - curse or boon.pdfChatGPT usage in software development - curse or boon.pdf
ChatGPT usage in software development - curse or boon.pdf
Laura Miller
 
summer file - Copy
summer file - Copysummer file - Copy
summer file - Copy
Rakesh Kumar
 
AI in Drupal: Evolution, Modules and Possibilities
AI in Drupal: Evolution, Modules and PossibilitiesAI in Drupal: Evolution, Modules and Possibilities
AI in Drupal: Evolution, Modules and Possibilities
Jorge López-Lago
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Ramiro Aduviri Velasco
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
abhishek36461
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Daniel Zivkovic
 
IRJET - A Study on Building a Web based Chatbot from Scratch
IRJET - A Study on Building a Web based Chatbot from ScratchIRJET - A Study on Building a Web based Chatbot from Scratch
IRJET - A Study on Building a Web based Chatbot from Scratch
IRJET Journal
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software
 
ChatGPT and AI for web developers - Maximiliano Firtman
ChatGPT and AI for web developers - Maximiliano FirtmanChatGPT and AI for web developers - Maximiliano Firtman
ChatGPT and AI for web developers - Maximiliano Firtman
Wey Wey Web
 
Java and graal vm to easily deploy your machine learning services
Java and graal vm to easily deploy your machine learning servicesJava and graal vm to easily deploy your machine learning services
Java and graal vm to easily deploy your machine learning services
Philippe Gottfrois
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPT
MuleSoft Meetups
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
NETWAYS
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
distributedtracing
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
DianaGray10
 
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
DevOpsDays Tel Aviv
 
Company Visitor Management System Report.docx
Company Visitor Management System Report.docxCompany Visitor Management System Report.docx
Company Visitor Management System Report.docx
fantabulous2024
 
hari_duche_updated
hari_duche_updatedhari_duche_updated
hari_duche_updated
Hari Duche
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?
Jordi Cabot
 
ChatGPT usage in software development - curse or boon.pdf
ChatGPT usage in software development - curse or boon.pdfChatGPT usage in software development - curse or boon.pdf
ChatGPT usage in software development - curse or boon.pdf
Laura Miller
 
summer file - Copy
summer file - Copysummer file - Copy
summer file - Copy
Rakesh Kumar
 
AI in Drupal: Evolution, Modules and Possibilities
AI in Drupal: Evolution, Modules and PossibilitiesAI in Drupal: Evolution, Modules and Possibilities
AI in Drupal: Evolution, Modules and Possibilities
Jorge López-Lago
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
abhishek36461
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Daniel Zivkovic
 
IRJET - A Study on Building a Web based Chatbot from Scratch
IRJET - A Study on Building a Web based Chatbot from ScratchIRJET - A Study on Building a Web based Chatbot from Scratch
IRJET - A Study on Building a Web based Chatbot from Scratch
IRJET Journal
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software
 
ChatGPT and AI for web developers - Maximiliano Firtman
ChatGPT and AI for web developers - Maximiliano FirtmanChatGPT and AI for web developers - Maximiliano Firtman
ChatGPT and AI for web developers - Maximiliano Firtman
Wey Wey Web
 
Java and graal vm to easily deploy your machine learning services
Java and graal vm to easily deploy your machine learning servicesJava and graal vm to easily deploy your machine learning services
Java and graal vm to easily deploy your machine learning services
Philippe Gottfrois
 
MuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPTMuleSoft + Augmented Reality & ChatGPT
MuleSoft + Augmented Reality & ChatGPT
MuleSoft Meetups
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
NETWAYS
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
DianaGray10
 
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
SPOTLIGHT IGNITE (10 MINUTES): THE FUTURE OF DEVELOPER TOOLS: FROM STACKOVERF...
DevOpsDays Tel Aviv
 
Company Visitor Management System Report.docx
Company Visitor Management System Report.docxCompany Visitor Management System Report.docx
Company Visitor Management System Report.docx
fantabulous2024
 
hari_duche_updated
hari_duche_updatedhari_duche_updated
hari_duche_updated
Hari Duche
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
 
Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?Software Modeling and Artificial Intelligence: friends or foes?
Software Modeling and Artificial Intelligence: friends or foes?
Jordi Cabot
 

Recently uploaded (20)

Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& ConsiderationsDesigning AI-Powered APIs on Azure: Best Practices& Considerations
Designing AI-Powered APIs on Azure: Best Practices& Considerations
Dinusha Kumarasiri
 
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...Exploring Code Comprehension  in Scientific Programming:  Preliminary Insight...
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
University of Hawai‘i at Mānoa
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Ad

wang-Leveraging-the-Power-of-ChatGPT-and-Vector-Databases-in-the-FreeBSD-Expert-System-slides.pdf

  • 1. Leveraging the Power of ChatGPT and Vector Databases in the FreeBSD Expert System Yan-Hao Wang, AsiaBSDCon 2024
  • 2. Who Am I My name is Yan-Hao Wang, a senior high student in Taiwan and FreeBSD Taiwan intern since 2022. I've been involved in various tasks such as 1. Developing an online document/man-page editor. 2. Crafting tests for command utilities like gunion(8) and printenv(1). 3. Translating FreeBSD documents.
  • 3. GitHub Repository All codes have been uploaded to the freebsd_data repository. The slide will also be put on it. If you're interested, you can access them there.
  • 4. Outline 1. Introduction of the Expert System 2. Introduction of ChatGPT 3. Development Process a. Data Cleaning and Extraction b. Embedded Model and Vector Database c. Integration with ChatGPT 4. OpenAI GPTs as Potential Replacements 5. Summary
  • 5. Expert System Expert system is a system that can answer user questions accurately in a specific domain. It consists of two parts 1. Knowledge Base: stores all the relevant information related to the domain of expertise. 2. Rule Engine: Contain some predefined rules by the data scientist. It processes the user's questions and applies rules to generate accurate responses.
  • 6. Expert System Modern expert systems use machine learning to simulate the behavior or judgment of domain experts. ML model
  • 7. ChatGPT ChatGPT (Chat Generative Pre-trained Transformer) is a chatbot developed by OpenAI and launched on November 30, 2022. Based on a large language model (LLM).
  • 8. FreeBSD Expert System There are multiple ways to build a FreeBSD expert system. 1. Train a new ML model with FreeBSD data. No. I am not an ML expert and it costs a lot.
  • 9. FreeBSD Expert System There are multiple ways to build a FreeBSD expert system. 1. Train a new ML model with FreeBSD data. No. I am not an ML expert and it costs a lot. 2. Use the existing model such as ChatGPT. But we definitely won’t call ChatGPT a FreeBSD expert system.
  • 10. FreeBSD Expert System ChatGPT uses amount of data for training. So he can answer problems in every domain though may not be correct. It's more like a general-purpose system.
  • 11. FreeBSD Expert System ChatGPT uses amount of data for training. So he can answer problems in every domain though may not be correct. It's more like a general-purpose system. The limitation of why ChatGPT can’t be called a FreeBSD expert system 1. Chatgpt may tendency to hallucinate answers when asked about unfamiliar domains. 2. The data is not new enough (ChatGPT uses data before 2021 to train). So he can’t answer the newest question.
  • 12. FreeBSD Expert System There are two ways to handle the limitation. 1. Fine-tune. fine-tuning is a process that takes a model that has already been trained for one given task and then tunes or tweaks the model to make it perform a second similar task.
  • 13. FreeBSD Expert System There are two ways to handle the limitation. First way is, fine-tune. fine-tuning is a process that takes a model that has already been trained for one given task and then tunes or tweaks the model to make it perform a similar task. OpenAI has supplied this API. For the open-source model, you should use Pytorch and TensorFlow to handle it.
  • 14. FreeBSD Expert System However, fine-tuning is still hard for AI-unfamiliar developer. And It also cost a lot. The second way is Retrieval Augmented Generation (RAG). Basically, it is just like when you use ChatGPT, you can provide related info about your question, and it can provide a much more accurate response. This is an acceptable way, so we will use the embedded model and vector database to achieve this.
  • 15. Embedded Model It is a type of ML model used to convert input data, such as words or sentences, into numerical representations called embedding vector or vector. These embeddings capture the semantic meaning or context of the input data in a continuous vector space. It can work on tasks such as text classification and sentiment analysis.
  • 16. Vector Database Vector databases are designed to store vectors efficiently. These databases employ various search algorithms to find the most similar vectors, such as t. Numerous open-source vector databases are available to choose from.
  • 19. Development - Data Extraction Use the simple find command to extract data. The data sources are very different, we need to convert it to plain text. We use “hs-pandoc” package to convert data.
  • 20. Development - Data Cleaning Remove unrelated info, simple find command to remove the unrelated data. Unrelated text
  • 21. Development - Data Cleaning Actually, data cleaning is the most time-consuming step. Data scientists spend 60% of their time cleaning data rather than creating insights. There are some tools that can help us clean the data. OpenRefine
  • 22. Development - Embedded Model OpenAI has embedded model API, there are multiple open source embedded models online too. In this project, we use the open source model (“gte-base”). MTEB Leaderboard - From Hugging Face OpenAI embedding model
  • 23. Development - Embedded Model [0.2, 0.3 … 2.3] [0.3, 0.6 … 1.7] [0.9, 0.1 … 3.1] vector 1 vector 2 vector 3
  • 24. Development - Embedded Model We use “gte-base” as our model. Its model size is only 0.22 GB which my small GPU (NVIDIA GeForce GTX 1050 Ti) can handle it. It takes only 590MB of GPU memory and 67 minutes to embed all the documents.
  • 27. Development - Embedded Model There are multiple facts (hyperparameters) we can tune here. For example 1. The length of sentences. 2. What metadata should we leave? 3. What model should we use? Weather we need to tune the embedded model. All these hyperparameters should be tried multiple times to get the best answer. The answer will be different with different fields - NFL(No Free Lunch Theorems)
  • 28. Development - Vector database As previously said, we have different vector databases. But in our local test, we just use a file to store the vector and a simple cosine similarity algorithm. Because our data is not big (< 100 MB).
  • 29. Development - Query Question: How to use the gunion command in FreeBSD? Query result: 1. Man page of gunion 2. Man page of gunion 3. FreeBSD status report (A New GEOM Facility, gunion) 4. Unrelated info …
  • 31. Development - Integration with ChatGPT So we need to host an embedded model and vector database and have an open API to let users use. Then integrate the API with ChatGPT 1. The first way is easy, we just write a Python code to use ChatGPT API and our API. But this is not friendly to normal users.
  • 32. Development - Integration with ChatGPT So we need to host an embedded model and vector database and have an open API to let users use. Then integrate the API with ChatGPT 1. The first way is easy, we just write a Python code to use ChatGPT API and our API. But this is not friendly to normal users. 2. Develop ChatGPT plugin, ChatGPT plugin can let us set some API. While asking questions ChatGPT, it will call the API and get the response. This is the best practice of our project, the user just needs to enable the plugin in ChatGPT.
  • 34. OpenAI GPTs as Potential Replacements GPTs was lauched at November 2023. It provides an easy way to generate a custom GPT for any data you have. Which becomes a potential replacement for our project. We only need to upload the data from step 1 and there is a custom expert system. On March 19, 2024, you will no longer be able to install new plugins or create new conversations with existing plugins.
  • 35. Wiki Future Audiences The idea is inspired by Wiki. They actually already have developed a plugin. But they also stopped the plan after the GPTs release. This timing also coincides with OpenAI’s move away from the plugin marketplace for ChatGPT, and towards no/low-code customizable GPTs. This shift has made our plugin in its current form inaccessible to new users and largely redundant. While we could repurpose this functionality towards being a GPT, we don’t believe we would learn significantly more beyond how to create a product within the OpenAI ecosystem. Lessons learned, ChatGPT has not become the new information seeking paradigm (yet?).
  • 36. Summary Solution RAG GPTs (Custom GPT) ChatGPT Plus (browse internet) Cost Medium ~ Hard Small Small Advantage ● Privacy ● Flexibility ● Fast ● Fast Disadvantage ● Cost ● Privacy ● Flexibility ● Data source are different ● Flexibility
  • 37. Summary The significance of LLM is poised to exponentially increase in the future, marking a pivotal shift in our technological landscape. While we may not complete the production process in its entirety. But it is a good thing to focus on any future trends and try to combine them with FreeBSD.
  • 38. Reference ● What is an Expert System? ● Do data scientists spend 80% of their time cleaning data? Turns out, no? ● Wiki, Talk:Future Audiences