0% found this document useful (0 votes)

85 views

How To Use Grounding For Your LLMs With Text Embeddings

Vertex AI Embeddings for Text allows generating embeddings from text inputs that capture the meaning and intent of language at a fine-grained level. These embeddings, when combined with vector search, enable fast and scalable semantic search across large text corpora grounded in business facts. For example, a demo searches 8 million Stack Overflow questions based on the semantic meaning of queries. Embeddings also enable applications like text classification, recommendation, and analytics by representing language semantics numerically.

Uploaded by

wilhelmjung

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views

How To Use Grounding For Your LLMs With Text Embeddings

Uploaded by

wilhelmjung

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Cloud Contact sales Get started for free

Blog Solutions & technology Ecosystem Developers & Practitioners Transform with Google Cloud

AI & Machine Learning

Vertex AI Embeddings for Text: Grounding LLMs

made easy
May 26, 2023

Kaz Sato Ivan Cheung

Developer Advocate, Google Cloud Developer Programs Engineer, Google Cloud

Google Cloud Many people are now starting to think about how to bring Gen AI and large language models (LLMs) to
Next production services. You may be wondering "How to integrate LLMs or AI chatbots with existing IT
systems, databases and business data?", "We have thousands of products. How can I let LLM memorize
Registration is open for
our flagship event August them all precisely?", or "How to handle the hallucination issues in AI chatbots to build a reliable service?".
29-31. Here is a quick solution: grounding with embeddings and vector search.

Register What is grounding? What are embedding and vector search? In this post, we will learn these crucial
concepts to build reliable Gen AI services for enterprise use. But before we dive deeper, here is an
example:

Semantic search on 8 million Stack Overflow questions in milliseconds. (Try the demo here)

This demo is available as a public live demo here. Select "STACKOVERFLOW" and enter any coding
question as a query, so it runs a text search on 8 million questions posted on Stack Overflow.

The following points make this demo unique:

LLM-enabled semantic search: The 8 million Stack Overflow questions and query text are both
interpreted by Vertex AI Generative AI models. The model understands the meaning and intent
(semantics) of the text and code snippets in the question body at librarian-level precision. The demo
leverages this ability for finding highly relevant questions and goes far beyond simple keyword search
in terms of user experience. For example, if you enter "How do I write a class that instantiates only
once", then the demo shows "How to create a singleton class" at the top, as the model knows their
meanings are the same in the context of computer programming.

Grounded to business facts: In this demo, we didn't try having the LLM to memorize the 8 million
items with complex and lengthy prompt engineering. Instead, we attached the Stack Overflow dataset
to the model as an external memory using vector search, and used no prompt engineering. This
means, the outputs are all directly "grounded" (connected) to the business facts, not the artificial
output from the LLM. So the demo is ready to be served today as a production service with mission
critical business responsibility. It does not suffer from the limitation of LLM memory or unexpected
behaviors of LLMs such as the hallucinations.

Scalable and fast: The demo gives you the search results in tens of milliseconds while retaining the
deep semantic understanding capability. Also, the demo is capable of scaling out to handle thousands
of search queries every second. This is enabled with the combination of LLM embeddings and
Google AI's vector search technology.

The key enablers of this solution are 1) the embeddings generated with Vertex AI Embeddings for Text
and 2) fast and scalable vector search by Vertex AI Matching Engine. Let's start by taking a look at these
technologies.

First key enabler: Vertex AI

Embeddings for Text
On May 10, 2023, Google Cloud announced the following Embedding APIs for Text and Image. They are
available on Vertex AI Model Garden.

Embeddings for Text : The API takes text input up to 3,072 input tokens and outputs 768
dimensional text embeddings, and is available as a public preview. As of May 10, 2023, the pricing is
$0.0001 per 1000 characters (the latest pricing is available on the Pricing for Generative AI models
page).

Embeddings for Image: Based on Google AI's Contrastive Captioners (CoCa) model, the API takes
either image or text input and outputs 1024 dimensional image/text multimodal embeddings, available
to trusted testers. This API outputs so-called "multimodal" embeddings, enabling multimodal queries
where you can execute semantic search on images by text queries, or vise-versa. We will feature this
API in another blog post soon.

In this blog, we will explain more about why embeddings are useful and show you how to build and an
application leveraging Embeddings API for Text. In a future blog post, we will provide a deep dive on
Embeddings API for Image.

Embeddings API for Text on Vertex AI Model Garden

What is embeddings?
So, what are semantic search and embeddings? With the rise of LLMs, why is it becoming important for
IT engineers and ITDMs to understand how they work? To learn it, please take a look at this video from a
Google I/O 2023 session for 5 minutes:

Also, Foundational courses: Embeddings on Google Machine Learning Crush Course and Meet AI’s
multitool: Vector embeddings by Dale Markowitz are great materials to learn more about embeddings.

LLM text embedding business use

cases
With the embedding API, you can apply the innovation of embeddings, combined with the LLM capability,
to various text processing tasks, such as:

LLM-enabled Semantic Search: text embeddings can be used to represent both the meaning and
intent of a user's query and documents in the embedding space. Documents that have similar
meaning to the user's query intent will be found fast with vector search technology. The model is
capable of generating text embeddings that capture the subtle nuances of each sentence and
paragraphs in the document.

LLM-enabled Text Classification: LLM text embeddings can be used for text classification with a
deep understanding of different contexts without any training or fine-tuning (so-called zero-shot
learning). This wasn't possible with the past language models without task-specific training.

LLM-enabled Recommendation: The text embedding can be used for recommendation systems as
a strong feature for training recommendation models such as Two-Tower model. The model learns the
relationship between the query and candidate embeddings, resulting in next-gen user experience with
semantic product recommendation.

LLM-enabled Clustering, Anomaly Detection, Sentiment Analysis, and more, can be also
handled with the LLM-level deep semantics understanding.

Sorting 8 million texts at "librarian-

level" precision
Vertex AI Embeddings for Text has an embedding space with 768 dimensions. As explained in the video
above, the space represents a huge map of a wide variety of texts in the world, organized by their
meanings. With each input text, the model can find a location (embedding) in the map.

The API can take 3,072 input tokens, so it can digest the overall meaning of a long text and even
programming code, and represent it as single embedding. It is like having a librarian knowledgeable about
a wide variety of industries, reading through millions of texts carefully, and sorting them with millions of
nano-categories that can classify even slight differences of subtle nuances.

By visualizing the embedding space, you can actually observe how the model sorts the texts at the
"librarian-level" precision. Nomic AI provides a platform called Atlas for storing, visualizing and interacting
with embedding spaces with high scalability and in a smooth UI, and they worked with Google for
visualizing the embedding space of the 8 million Stack Overflow questions. You can try exploring around
the space, zooming in and out to each data point on your browser on this page, courtesy of Nomic AI.

8 million Stack Overflow questions embedding spaceVisualized by Nomic AI Atlas (Try exploring it here)

Examples of the "librarian-level" semantic understanding by Embeddings API with Stack Overflow questions

Note that this demo didn't require any training or fine-tuning with computer programming specific
datasets. This is the innovative part of the zero-shot learning capability of the LLM; it can be applied to a
wide variety of industries, including finance, healthcare, retail, manufacturing, construction, media, and
more, for deep semantic search on the industry-focused business documents without spending time and
cost for collecting industry specific datasets and training models.

The second key enabler: fast and

scalable Vector Search
The second key enabler of the Stack Overflow demo shown earlier is the vector search technology. This
is another innovation we are having in the data science field.

The problem is "how to find similar embeddings in the embedding space". Since embeddings are vectors,
this can be done by calculating the distance or similarity between vectors, as shown below.

Fastandscalablevectorsearch
isn'teasy

Bruteforcesearchtakestoolong:O(dimsxitems)

lla-bll, cOsO a:b

L2distance cosinesimilarity innerproduct

But this isn't easy when you have millions or billions of embeddings. For example, if you have 8 million
embeddings with 768 dimensions, you would need to repeat the calculation in the order of 8 million x 768.
This would take a very long time to finish. Actually, when we tried this on BigQuery with one million
embeddings five years ago, it took 20 seconds.

So the researchers have been studying a technique called Approximate Nearest Neighbor (ANN) for
faster search. ANN uses "vector quantization" for separating the space into multiple spaces with a tree
structure. This is similar to the index in relational databases for improving the query performance,
enabling very fast and scalable search with billions of embeddings.

With the rise of LLMs, the ANN is getting popular quite rapidly, known as the Vector Search technology.

ApproximateNearestNeighbor(ANN): Facia

Fastandscalablevectorsearch
Web

(20)

ofAmerta(3)

Buildinganindexwith
VectorQuantization
Bad Company

America(2)

Theory

In 2020, Google Research published a new ANN algorithm called ScaNN. It is considered one of the best
ANN algorithms in the industry, also the most important foundation for search and recommendation in
major Google services such as Google Search, YouTube and many others.

ScaNN:ThevectorsearchserviceforGoogleSearch,YouTubeandPlay

9000

8000
Achievinghigheraccuracy hnsw(faiss)
hnswlib
•7000
mint

NGT-panng-e
econ

withshorterlatency 6000
NGT-onng
kgraph
haswiamli

5000 Sw-graph(nmslib)
ScaNN

4000
(QU

3000F0

{2000
6

1000

TraditionalLoss AnisotropicLoss
0.86 0.88 0.9 0.92 0.94 0.96 0.98
Accuracy(Recall@10)

Google Cloud developers can take the full advantage of Google's vector search technology with Vertex AI
Matching Engine. With this fully managed service, developers can just add the embeddings to its index
and issue a search query with a key embedding for the blazingly fast vector search. In the case of the
Stack Overflow demo, Matching Engine can find relevant questions from 8 million embeddings in tens of
milliseconds.

With Matching Engine, you don't need to spend much time and money building your own vector search
service from scratch or using open source tools if your goal is high scalability, availability and
maintainability for production systems.

Grounding LLM outputs with

Matching Engine
By combining the Embeddings API and Matching Engine, you can use the embeddings to "ground" LLM
outputs to real business data with low latency:

In the case of the Stack Overflow demo shown earlier, we've built a system with the following architecture.

Stack Overflow semantic search demo architecture

The demo architecture has two parts: 1) building a Matching Engine index with Vertex AI Workbench and
the Stack Overflow dataset on BigQuery (on the right) and 2) processing vector search requests with
Cloud Run (on the left) and Matching Engine. For the details, please see the sample Notebook on GitHub.

Grounding LLMs with LangChain and

Vertex AI
In addition to the architecture used for the Stack Overflow demo, another popular way for grounding is to
enter the vector search result into the LLM and let the LLM generate the final answer text for the user.
LangChain is a popular tool for implementing this pipeline, and Vertex AI Gen AI embedding APIs and
Matching Engine are definitely best suited for LangChain integration. In a future blog post, we will explore
this topic further. So stay tuned!

How to get started

In this post, we have seen how the combination of Embeddings for Text API and Matching Engine allows
enterprises to use Gen AI and LLMs in a grounded and reliable way. The fine-grained semantic
understanding capability of the API can bring the intelligence to information search and recommendation
in a wide variety of businesses, setting a new standard of user experience in enterprise IT systems.

To get started, please check out the following resources:

Stack Overflow semantic search demo: sample Notebook on GitHub

Vertex AI Embeddings for Text API documentation

Matching Engine documentation

AI & Machine Learning

Google Cloud advances generative AI at

I/O: new foundation models, embeddings,
and tuning tools in Vertex AI
By June Yang • 5-minute read

Posted in AI & Machine Learning—Developers & Practitioners

Telecommunications Startups AI & Machine Learning AI & Machine Learning

Generative AI: The next Building an AI startup? 4 Computer vision made easy: Highlight your generative AI
phase of cloud insights from 3 top AI startup Vision AI on Spring Boot and skills by earning the new no-
transformation for founders Java cost skill badge
communications service By Darren Mowry • 4-minute read By Abirami Sukumaran • 7-minute read By Liz Bredlau • 2-minute read
providers
By Amol Phadke • 4-minute read

Google Cloud Google Cloud Products Privacy Terms Help English

AI Teachable Machine
No ratings yet
AI Teachable Machine
23 pages
Databricks 101
No ratings yet
Databricks 101
16 pages
Unconditional Love
No ratings yet
Unconditional Love
4 pages
Philippine History Syllabus
No ratings yet
Philippine History Syllabus
7 pages
Selling The Value of Google Workspace To Microsoft Customers - Y22
No ratings yet
Selling The Value of Google Workspace To Microsoft Customers - Y22
36 pages
Whitepaper Roi of Devops Transformation 2020 Google Cloud
No ratings yet
Whitepaper Roi of Devops Transformation 2020 Google Cloud
40 pages
Lesson plAN FOR DEMO
No ratings yet
Lesson plAN FOR DEMO
10 pages
Sitecore Learning
No ratings yet
Sitecore Learning
1 page
Hands-On Deep Learning For Images With T PDF
No ratings yet
Hands-On Deep Learning For Images With T PDF
3 pages
(Michael S. Gazzaniga) The Social Brain
100% (1)
(Michael S. Gazzaniga) The Social Brain
116 pages
Ielts Practice Task The Amazing Brains of Babies: TASK TYPE 10 Summary Completion
No ratings yet
Ielts Practice Task The Amazing Brains of Babies: TASK TYPE 10 Summary Completion
2 pages
Reflection Report - LTA - Bui Thu Hien
No ratings yet
Reflection Report - LTA - Bui Thu Hien
33 pages
E Book Unleashing AI Powered Search Pureinsights
No ratings yet
E Book Unleashing AI Powered Search Pureinsights
48 pages
Advances in Web Inteligent-2
No ratings yet
Advances in Web Inteligent-2
190 pages
IBM Data Science Capstone
No ratings yet
IBM Data Science Capstone
51 pages
Seshathri S: Education
No ratings yet
Seshathri S: Education
1 page
Marton Kodok Google Cloud Platform Solutions For DevOp Engineers Min
No ratings yet
Marton Kodok Google Cloud Platform Solutions For DevOp Engineers Min
53 pages
Shopify's Big Data Platform
No ratings yet
Shopify's Big Data Platform
28 pages
Cloudera Nokia Case Study Final
No ratings yet
Cloudera Nokia Case Study Final
2 pages
Tech 101 For PMs HelloPM 1640879694
No ratings yet
Tech 101 For PMs HelloPM 1640879694
10 pages
ITG Company Profile 2022
No ratings yet
ITG Company Profile 2022
24 pages
Boost Your Content On Linkedin
No ratings yet
Boost Your Content On Linkedin
3 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Google Data Analytics Strategy
No ratings yet
Google Data Analytics Strategy
4 pages
Zoho Creator - Mis Survey
0% (1)
Zoho Creator - Mis Survey
13 pages
5 Best Portfolio-Ready Data Analytics Projects For Beginners by Learnbay Blogs May, 2023 Medium
No ratings yet
5 Best Portfolio-Ready Data Analytics Projects For Beginners by Learnbay Blogs May, 2023 Medium
17 pages
AWS Data Analytics Specialty Exam Cram Notes
No ratings yet
AWS Data Analytics Specialty Exam Cram Notes
43 pages
The Impact of Quantum Computing On Cybersecurity
No ratings yet
The Impact of Quantum Computing On Cybersecurity
3 pages
To-Do List: Notifications Inbox Posts & Stories Planner Ads Insights All Tools
No ratings yet
To-Do List: Notifications Inbox Posts & Stories Planner Ads Insights All Tools
1 page
Fandomx Universe Lite
No ratings yet
Fandomx Universe Lite
15 pages
Intelligent Vehicles: Maurice Thompson Director - Telematics & M2M Devices Verizon Wireless
No ratings yet
Intelligent Vehicles: Maurice Thompson Director - Telematics & M2M Devices Verizon Wireless
9 pages
How Google Big Query Changed The Game
No ratings yet
How Google Big Query Changed The Game
11 pages
OC - Module 1 - Intro To BDA 021312
No ratings yet
OC - Module 1 - Intro To BDA 021312
38 pages
Download Complete (Ebook) Generative AI with LangChain by Ben Auffarth ISBN 9781835083468, 1835083463 PDF for All Chapters
100% (17)
Download Complete (Ebook) Generative AI with LangChain by Ben Auffarth ISBN 9781835083468, 1835083463 PDF for All Chapters
65 pages
Introduction To Windows Azure Blob Storage
No ratings yet
Introduction To Windows Azure Blob Storage
19 pages
IRON-GPT Guide Enduser
No ratings yet
IRON-GPT Guide Enduser
930 pages
Data Mining
No ratings yet
Data Mining
38 pages
Introduction To AIOps - Simplilearn
No ratings yet
Introduction To AIOps - Simplilearn
14 pages
64e8c37a3a32b1b85d479988 - AIPromptPlaybook v1
No ratings yet
64e8c37a3a32b1b85d479988 - AIPromptPlaybook v1
28 pages
DI Advanced
No ratings yet
DI Advanced
194 pages
The Forrester Wave™ - Hyperconverged Infrastructure, Q3 2020
No ratings yet
The Forrester Wave™ - Hyperconverged Infrastructure, Q3 2020
18 pages
Product Management Excerpts: Sandeep Chadda & IIMA PGPX 2019-20 Batch
No ratings yet
Product Management Excerpts: Sandeep Chadda & IIMA PGPX 2019-20 Batch
9 pages
A Comparison of in Memory Databases
No ratings yet
A Comparison of in Memory Databases
6 pages
Foundations Data Data Everywhere Notes
No ratings yet
Foundations Data Data Everywhere Notes
4 pages
Chatbot Frameworks
0% (1)
Chatbot Frameworks
9 pages
Codelab: Get Started With Dialogflow
No ratings yet
Codelab: Get Started With Dialogflow
41 pages
AWS DynamoDB Notes
No ratings yet
AWS DynamoDB Notes
2 pages
Download (Ebook) OpenAI API Cookbook: Build intelligent applications including chatbots, virtual assistants, and content generators by Henry Habib ISBN 9781805121350, 1805121359 ebook All Chapters PDF
100% (13)
Download (Ebook) OpenAI API Cookbook: Build intelligent applications including chatbots, virtual assistants, and content generators by Henry Habib ISBN 9781805121350, 1805121359 ebook All Chapters PDF
55 pages
Outline: How To Install Development Setup
No ratings yet
Outline: How To Install Development Setup
8 pages
Installation PowerCenter Express
No ratings yet
Installation PowerCenter Express
60 pages
Fintech
No ratings yet
Fintech
134 pages
Visual Studio Testing Tools - V4
No ratings yet
Visual Studio Testing Tools - V4
31 pages
Scrum PSM1
No ratings yet
Scrum PSM1
19 pages
Cyara Building Smarter Chatbots Ebook 20230123
No ratings yet
Cyara Building Smarter Chatbots Ebook 20230123
21 pages
Complete Vue Mastery Course Guide
No ratings yet
Complete Vue Mastery Course Guide
11 pages
BabelQuest - How To Build A Chatbot (Checklist)
100% (1)
BabelQuest - How To Build A Chatbot (Checklist)
4 pages
Building GenAI Products and Business Outline Web
No ratings yet
Building GenAI Products and Business Outline Web
8 pages
A Survey of AI Text-to-Image and AI Text-to-Video Generators
No ratings yet
A Survey of AI Text-to-Image and AI Text-to-Video Generators
5 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Data Analytics Process
No ratings yet
Data Analytics Process
9 pages
Gestalt Principles Assignment
No ratings yet
Gestalt Principles Assignment
21 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
17 pages
Chat GPT
No ratings yet
Chat GPT
105 pages
Cloud Computing Report
No ratings yet
Cloud Computing Report
31 pages
C# PDF
No ratings yet
C# PDF
17 pages
Apache Spark Cheatsheet (2014)
No ratings yet
Apache Spark Cheatsheet (2014)
9 pages
5.1steps and Objectives For Listening
No ratings yet
5.1steps and Objectives For Listening
21 pages
Enc3250 Reflection Essay
No ratings yet
Enc3250 Reflection Essay
2 pages
Child Rearing Practices of Filipino Urban Mothers - Relationship To Children's Cognitive Development PDF
No ratings yet
Child Rearing Practices of Filipino Urban Mothers - Relationship To Children's Cognitive Development PDF
32 pages
Nur Summer Vacation Task 2023
No ratings yet
Nur Summer Vacation Task 2023
39 pages
Personal Integrity Essay
100% (3)
Personal Integrity Essay
2 pages
Ballantyne 2006
No ratings yet
Ballantyne 2006
14 pages
Basic of SVM Algorithm
No ratings yet
Basic of SVM Algorithm
10 pages
Week2 English6 Regachuelo Clares
No ratings yet
Week2 English6 Regachuelo Clares
4 pages
Connecting With Practice
No ratings yet
Connecting With Practice
3 pages
Drama - Advertising, Mid Yrs
No ratings yet
Drama - Advertising, Mid Yrs
7 pages
Bidhu Final V36by48 2
No ratings yet
Bidhu Final V36by48 2
1 page
Narrative Report - Home Visitation
No ratings yet
Narrative Report - Home Visitation
2 pages
Structure and Written Expression Practice Questions
No ratings yet
Structure and Written Expression Practice Questions
4 pages
Education Skills: Montessori Professional College of Asia
No ratings yet
Education Skills: Montessori Professional College of Asia
1 page
M8 - 8.2 - Reflective Teaching - PDF
No ratings yet
M8 - 8.2 - Reflective Teaching - PDF
23 pages
Format For BOOK REPORT
No ratings yet
Format For BOOK REPORT
22 pages
CW2: Portfolio
No ratings yet
CW2: Portfolio
11 pages
DLL PerDev Week 8
100% (1)
DLL PerDev Week 8
4 pages
Introduction and Overview of Data and Information Fusion
No ratings yet
Introduction and Overview of Data and Information Fusion
34 pages
Spoken English Syllabus
100% (2)
Spoken English Syllabus
4 pages
Biomechanics in Physical Education Sports and Research
No ratings yet
Biomechanics in Physical Education Sports and Research
20 pages
NP5 Nursing Board Exam December 2006 Answer Key 'Nursing Care of Client With Physiological and Psychosocial Alteration'
No ratings yet
NP5 Nursing Board Exam December 2006 Answer Key 'Nursing Care of Client With Physiological and Psychosocial Alteration'
13 pages
Fat Man Little Boy Movie Discussion Questions
No ratings yet
Fat Man Little Boy Movie Discussion Questions
10 pages
Agility
No ratings yet
Agility
5 pages