0% found this document useful (0 votes)

15 views

Ultimate Guide to Embedding Models

Uploaded by

fede.c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Ultimate Guide to Embedding Models

Uploaded by

fede.c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Michael Ryaboy

CONTENTS

The critical role of embedding Page 3

models in arti cial intelligence

Understanding embedding: Page 5

from sentence to vectors

The modern landscape of Page 11

embedding models

Selecting the right embedding Page 23

model: A systematic approach

Strategies for evaluating an Page 27

embedding model

Key takeaways Page 46

www.kx.com 2
The critical role of
embedding models in
arti cial intelligence

PART 1
What are embedding models?
Embedding models are a fundamental component of modern AI systems, serving as the
bridge between raw data and machine-understandable representations. At their core, these
models transform discrete objects such as words, sentences, images, or any other form of
data into continuous vector representations in a high-dimensional space.
Consider a simple analogy: if words were cities on a map, embeddings would be their GPS
coordinates. Just as nearby cities on a map are geographically related, words with similar
meanings have embeddings that are close to each other in the vector space.

Elephant: [0.78, 0.55] Snake: [0.80, 0.12]

Cat: [0.21, 0.43]

Dog: [0.23, 0.44]

Penguin: [0.74, 0.51]
Kangaroo: [0.76, 0.57]

Why embeddings ma er in AI
The importance of embedding models in AI applications is di cult to overstate. They enable
machines to capture semantic relationships, understand context, and perform complex tasks
across various domains.
Here are some key reasons why embeddings are crucial:
Semantic understanding: Embeddings capture meaning beyond simple keyword
matching, allowing AI systems to understand nuances and context. This means you can
e ectively compare very di erent kinds of data, text queries and documents, for
example.
Dimensionality reduction: They provide a compact representation of data, reducing high-
dimensional sparse vectors to dense, lower-dimensional ones.
Cross-modal applications: Embeddings enable comparisons between di erent types of
data (e.g. text and images) in a shared vector space.
E ciency: They allow for fast similarity computations, essential for large-scale
applications like search and recommendation systems.

www.kx.com 4
Understanding
embeddings: from
sentences to vectors

PART 2
The concept of vector spaces
To grasp the power of embeddings, it's essential to understand the concept of vector
spaces. In an embedding space, each dimension represents a feature or a ribute of the
data. The position of a vector in this space encodes semantic information about the object
it represents.
We can use the sentence-transformers library to generate our embeddings with hundreds of
dimensions, and reduce down to two dimensions with Principle Component Analysis.
Don’t get too caught up with the code, what’s important is that we use an embedding
model, in this case all-MiniLM-v6-v2, to generate vectors of 384 dimensions, and by doing so
we can nd that some sentences form clusters due to semantic similarity.

import numpy as np
import matplotlib.pyplot as plt
from sentence_transformers import SentenceTransformer
from sklearn.decomposition import PCA

# Load a pre-trained sentence transformer model

model = SentenceTransformer('all-MiniLM-L6-v2')

# List of sentences to visualize

sentences = [
# Medical professions
"The doctor examines patients in the hospital.",
"The PA cares for people in the clinic.",
"The surgeon operates on cases in the theater.",
"The psychiatrist counsels clients in the office.",

# Educational professions
"The teacher instructs students in the classroom.",
"The professor lectures learners in the auditorium.",
"The tutor coaches pupils in the study space.",
"The principal manages staff in the school.",

# Technology professions
"The programmer writes code in the workspace.",
"The data engineer creates a data pipeline from home.",
"The data scientist analyzes patterns in the lab.",
"The network engineer maintains systems in the server room.",

# Culinary professions
"The chef prepares meals in the kitchen.",
"The baker creates pastries in the bakery.",
"The sommelier selects wines in the restaurant.",
"The butcher slices meat in the shop."
]

www.kx.com 6
The concept of vector spaces
# Get sentence embeddings
embeddings = model.encode(sentences)

# Reduce to 2 dimensions using PCA

pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(embeddings)

# Plotting
plt.figure(figsize=(15, 10))
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], alpha=0.7)

# Add annotations
for i, sentence in enumerate(sentences):
plt.annotate(sentence,
(embeddings_2d[i, 0], embeddings_2d[i, 1]),
xytext=(5, 5),
textcoords='offset points',
ha='left',
va='bottom',
bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0'))

plt.title('2D Sentence Embedding Space', fontsize=16)

plt.xlabel('Principal Component 1', fontsize=12)
plt.ylabel('Principal Component 2', fontsize=12)
plt.grid(True)
plt.tight_layout()
plt.show()

www.kx.com 7
The evolution of embedding models:
A shi towards versatility
Historically, the embedding model landscape was highly segmented, with di erent models
specialized for various data types and granularities. However, recent advancements have
dramatically reshaped this eld, blurring the lines between these categories and introducing
more versatile, powerful models.
The historical perspective
In the past, we typically categorized embedding models as follows:
Word embeddings: Models like Word2Vec and GloVe, focused on representing individual
words.
Sentence embeddings: Specialized models like early BERT variants and Universal
Sentence Encoder, designed to capture meaning at the sentence level.
Document embeddings: Models like Doc2Vec, aimed at representing entire documents
or longer texts.
Visual embeddings: Focused on representing images or video frames, o en using models
like ResNet.
Graph embeddings: Specialized in capturing relationships in graph-structured data, with
models like node2vec.
This segmentation meant that di erent tasks o en required di erent embedding models,
leading to complex pipelines and potential inconsistencies across applications.

www.kx.com 8
The evolution of embedding models:
a shi towards versatility
The modern landscape: versatility, multimodality, and power
Today's embedding model ecosystem looks quite di erent. The trend has shi ed towards more
versatile, powerful models that can handle a wide range of data types and lengths. Here's a
snapshot of the current landscape:
Text embedding models:
OpenAI's text-embedding-3-large: A versatile model capable of handling everything from
short phrases to long documents.
NVIDIA's NV-Embed-v1: O ering high performance across various text lengths and tasks.
Voyage AI's voyage-large-2-instruct: Specialized for enterprise applications but versatile
across text types.
These models have essentially eliminated the need for separate word, sentence, and document
embedding models. They can e ectively embed anything from a single word to a multi-page
document, o en with thousands of tokens in a single context window.
There has also been a trend towards multimodal embedding models—models that work with
di erent kinds of data. These are very useful in practice, because the search query is o en in
text form, even if the data being searched isn’t.
Image embedding models:
OpenAI's CLIP: While originally groundbreaking for its multimodal capabilities, it remains a
strong choice for image embeddings.
Meta's ImageBind: O ering state-of-the-art performance in image embedding tasks.
Audio Embedding Models:
Meta's ImageBind: Surprisingly, this model excels not just in images but also in audio
embedding tasks.
Facebook AI's wav2vec2: Specialized for audio and speech recognition tasks.

www.kx.com 9
The evolution of embedding models:
a shi towards versatility
Key advancements
Uni ed text understanding: Modern text embedding models can handle various text
lengths, from short queries to full documents, in a single model.
Increased context windows: Many models now support thousands of tokens, allowing for
more comprehensive document understanding.
Multimodal integration: Models like ImageBind and CLIP are breaking down barriers
between di erent data types, allowing for uni ed search and analysis across text,
images, and audio.
Improved performance: These models o en outperform their specialized predecessors
across various tasks.

Implications for AI applications

This shi towards versatile, powerful embedding models has signi cant implications:
Simpli ed pipelines: The same model can now be used for embedding user queries and
documents, streamlining search, and retrieval systems.
Cross-modal search: It's now possible to create search systems that can nd relevant
images or audio clips from text queries, and vice versa.
Improved consistency: Using a single model across di erent data types and lengths
ensures more consistent representations and be er overall system performance.
As the eld continues to evolve, we can expect even more powerful and versatile embedding
models, further simplifying AI pipelines while improving performance across a wide range of
tasks and data types.

www.kx.com 10
The modern landscape of
embedding models

PART 3
The modern landscape
of embedding models
The embedding model ecosystem has evolved signi cantly, with distinctions between open-
source, closed-source (proprietary), and domain-speci c models. Each category o ers unique
advantages for di erent use cases and deployment scenarios.
Open-source models
Open-source embedding models provide transparency, customizability, and can run locally
without API dependencies. The MTEB (Massive Text Embedding Benchmark) leaderboard is an
invaluable resource for identifying top-performing open-source models.
Key advantages of open-source models:
Full control over deployment and customization
No ongoing API costs
Privacy-preserving (data doesn't leave your infrastructure)
Community-driven improvements and support
When choosing an open-source model, it's o en best to consider the current top performers on
the MTEB leaderboard, particularly in the 'retrieval' category. As of this writing, the top model
for retrieval tasks is `gte-Qwen2-7B-instruct`. However, it's important to note that this can change
frequently as new models are developed and added to the leaderboard.

www.kx.com 12
The modern landscape
of embedding models
Here's how you might use the current top-performing model:

!pip install sentence-transformers flash_attn #flash_attn is necessary for gte-Qwen2-7B-

instruct
from sentence_transformers import SentenceTransformer

# Note: just because a model is a top-performer, doesn't make it ideal for your use case.
This model has 7613 parameters. It's massive and hard to work with.
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Alibaba-NLP/gte-Qwen2-7B-instruct", trust_remote_code=True)

# In case you want to reduce the maximum length:

model.max_seq_length = 8192

queries = [
"how much protein should a female eat",
"summit define",
]

documents = [
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70
is 46 grams per day. But, as you can see from this chart, you'll need to increase that if
you're expecting or training for a marathon. Check out the chart below to see how much
protein you should be eating each day.",
"Definition of summit for English Language Learners. : 1 the highest point of a mountain
: the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between
the leaders of two or more governments.",
]

query_embeddings = model.encode(queries, prompt_name="query")

document_embeddings = model.encode(documents)

scores = (query_embeddings @ document_embeddings.T) * 100

print(scores.tolist())

www.kx.com 13
Open-source models
When evaluating open-source models on MTEB, focus on:
Retrieval average score
Number of dimensions of the vectors generated
Model size (for deployment considerations)
Sequence length supported
Recency of the model's addition to the leaderboard
It’s o en best to use a small/smaller model, because they're much easier to work with, and
o en provide good-enough accuracy. Generally, it’s not recommended to go with the best
model, because of latency and memory considerations.
Open source models have one major drawback: inference. While it’s easy to embed your
dataset locally, at inference time (when you are embedding the user query to search your
documents, which is likely in a vector database such as KDB.AI), it’s o en challenging to
deploy these embedding models.
There are lots of providers that let you run inference for common models, Cloud are and
HuggingFace for example, but less common or cu ing edge models may not be supported.
It’s also possible that the provider is inconsistent, embeds queries too slowly, or can change
their API. Still, you can change providers if you need to if you are using a common model such
as bge-base-en, which couldn’t be said about closed-source providers.
However, using an open-source model allows you to embed on your own infrastructure. This
means you can use a distributed architecture to embed your massive dataset of billions of
vectors e ectively and (relatively) cheaply, and also means you can deploy your own
optimized inference engine physically closer to your vector database, ge ing much faster
inference times in the process.

www.kx.com 14
Closed-source models
Closed-source models, typically accessed via APIs, o en provide state-of-the-art performance
and are continuously updated by their providers.
Key advantages of closed-source models:
O en cu ing-edge performance
Regular updates and improvements
Simpli ed deployment through APIs
Potential for specialized models
Popular closed-source embedding models:
OpenAI embeddings: Models like text-embedding-3-small and text-embedding-3-large o er
strong general-purpose performance. One thing to note is that because larger embedding
models can outperform smaller embedding models even when their size is shortened, it o en
makes sense to use a larger embedding model and then compress it.
The OpenAI API allows you to adjust the dimensions parameter to decide the size of your
embedding. It still takes slightly longer to generate a shortened embedding, but this
di erence might help improve accuracy while using even less memory than an
uncompressed small embedding.
Cohere embeddings: O ers models like `mbed-multilingual-v3.0 with strong multilingual
support.
Voyage AI: O ers well performing models such as voyage-large-2-instruct which was on the
top of the MTEB leaderboard until recently.

www.kx.com 15
Closed-source models
Closed-source models are really easy to get started with, and might be a good choice for a side
project. With VoyageAI, you get 50M tokens embedded for free. It also makes sense to use one
if you need a domain-speci c model.
However, as you scale, these models get expensive, and lock you into their inference APIs,
which can be very slow (200-500ms). Running inference on an open-source model can be much
cheaper and faster in many cases, but may require some experience deploying ML models. You
can always use HuggingFace Inference, but this can be slow. That’s why it makes sense to use a
common open source embedding model, which allowa you to easily change providers if
necessary.
For embedding your dataset, it o en doesn’t ma er whether you use an open-source model or
a closed-source model from a speed perspective. Your latency will be similar. Here, we compare
embedding 5000 chunks with a few common providers, and FastEmbed (a library for fast
embedding inference, which loads and generates embeddings faster than sentence-
transformers) on CPU. On a GPU, there isn’t much of a di erence in terms of latency.

Disclaimer: The following metrics are uno cial and for demonstration purposes only. They
illustrate typical latency levels but should not be considered de nitive or current. When
evaluating embedding models, always conduct your own performance tests to ensure accuracy
and relevance to your speci c use case. Actual speeds may vary and could have improved since
publication. The length of your chunks may have an impact on latency as well.

www.kx.com 16
Closed-source models
The problem begins at inference time. Closed-source providers batch requests, which is ne
for embedding your dataset, but at query time results in poor performance. Cohere,
VoyageAI, and OpenAI are all hundreds of milliseconds.
This eval was run several times, and there was a large amount of variation throughout the
day, which means you may get much faster responses (sometimes Cohere averaged 100ms,
most of which is likely over-the-wire time).
Still, this is far too slow for generating one embedding. If we are using a very fast inference
provider like text-embeddings-inference, we would be able to get <1ms inference time in a
GPU. This is reason enough to think twice before using a closed-source provider, as becomes
challenging to make these kinds of optimizations later. Note that the FastEmbed latency is
once again on a CPU.

If you want to make RAG as fast as possible, make sure your inference GPU is physically close
to your vector database, and ideally your reranker as well. A common setup is performing
requests one a er the other, and if the components are far apart, even the speed of light
can add up quickly.

www.kx.com 17
Domain-speci c models
An emerging trend in the embedding model landscape is the development of domain-speci c
models. These models are ne-tuned or speci cally trained for particular industries or types of
data, o ering superior performance in their specialized domains. The models are not easy to
train, and they perform well in an entire eld, not just a speci c datasets.
Key advantages of domain-speci c models:
Optimized performance for speci c use cases
Be er understanding of domain-speci c terminology and context
Potential for improved accuracy in specialized tasks
Examples of domain-speci c models:
Voyage AI Models:
voyage- nance-2 Optimized for nancial text and market insights
voyage-law-2 Specialized for legal document analysis
voyage-code-2 Tailored for programming languages and so ware engineering tasks
Cohere Multilingual Model:
Designed to handle multiple languages e ectively, crucial for global applications.

Fine-tuned models
For many use cases, a general text model or a domain-speci c model will perform well. But if
your text is speci c enough, then ne-tuning can give 20-40% boosts in retrieval accuracy.
Some closed-source providers have a ne-tuning interface, such as Cohere. This is useful
because these providers will also deploy your model.
It’s also not challenging to ne-tune an embedding model on your own data, as long as you
have query-relevant document pairs to train on. These can o en be generated with an LLM.
This makes sense on something like medical data—a general embedding model might not be
able to understand certain concepts and terms, and therefore will perform poorly.

www.kx.com 18
Choosing the right model
When selecting an embedding model, consider the following:
Open-source vs. closed-source: Evaluate based on your needs for customization, privacy,
and deployment exibility.
Domain speci city: For specialized applications, domain-speci c models might
signi cantly outperform general-purpose ones.
MTEB leaderboard: For open-source models, refer to the latest MTEB rankings,
particularly the retrieval tab for search-related tasks.
Resource constraints: Consider model size and computational requirements, especially
for on-premise deployments.
Multilingual needs: If your application deals with multiple languages, prioritize models
with strong multilingual support.
Latency and throughput: Evaluate the model's performance in terms of speed and ability
to handle your expected query volume. Closed source models are especially important to
benchmark. Since they batch requests, they might be unusable for inference. Your user
shouldn’t be waiting 500ms for a response.
Always perform thorough evaluations on your speci c dataset and use case, regardless of
the model's general performance or ranking. The embedding model landscape is rapidly
evolving, so staying updated with the latest developments and periodically re-evaluating
your choices is crucial for maintaining optimal performance in your AI applications.
If you have evals, use them.

www.kx.com 19
The power of reranking
While choosing the right embedding model is crucial, the emergence of reranking techniques
has revolutionized the way we approach semantic search and retrieval. Reranking allows us to
use smaller, more e cient embedding models while still achieving high-quality results.
Reranking is a two-stage process:
Initial retrieval: Use a lightweight embedding model to retrieve a larger set of potentially
relevant documents from a vector database (e.g. top 50 or even top 1000).
Reranking: Apply a more sophisticated model to reorder these results, pushing the most
relevant documents to the top.
This approach combines the e ciency of simpler models with the accuracy of more complex
ones.
Bene ts of reranking
Improved accuracy: Rerankers can consider more nuanced relationships between queries
and documents, leading to be er search results.
E ciency: Allows for using smaller, faster embedding models for initial retrieval.
Flexibility: Can be added to existing search systems without completely overhauling the
infrastructure.
Reranking and smaller embedding models
Reranking enables the use of smaller, more e cient embedding models for several reasons:
Reduced precision requirements: The initial embedding model only needs to be good
enough to capture a broad set of potentially relevant documents.
Computational trade-o : The computational cost has shi ed from the embedding stage to
the reranking stage, which operates on a much smaller set of documents.
Dimensionality reduction: Smaller embedding models o en produce lower-dimensional
vectors, reducing storage and retrieval costs.
For example, you might use a lightweight model like MiniLM to generate 384-dimensional
embeddings for your entire corpus, and then apply a more sophisticated reranker like Cohere's
Rerank or an open-source cross-encoder to ne-tune the results.

www.kx.com 20
The power of reranking
Limitations of reranking: While powerful, reranking does come with some limitations:
Latency: Reranking typically adds 50-500ms to the query time, depending on the model
and number of documents, as well as the length of the documents. Longer documents
take much longer to rerank.
Document limit: Most rerankers have a practical limit of around 1000 documents they can
process per query.
Additional complexity: Introduces another component to the search stack that needs to
be managed and monitored.

Example of using Cohere rerank for reranking:

import cohere
co = cohere.Client('{apiKey}')

query = 'What is the capital of the United States?'

docs = ['Carson City is the capital city of the American state of Nevada.',
'The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific
Ocean. Its capital is Saipan.',
'Washington, D.C. (also known as simply Washington or D.C., and officially as the
District of Columbia) is the capital of the United States. It is a federal district. ',
'Capital punishment (the death penalty) has existed in the United States since before the
United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.'
]
results = co.rerank(query=query, documents=docs, top_n=3, model='rerank-english-v3.0')

For those looking for open-source alternatives to Cohere and VoyageAI rerankers, open-source
cross-encoders o er a compelling option:
Performance: Open-source cross-encoders like those from the `sentence-transformers` library
can achieve comparable performance to proprietary solutions.
Speed: Many open-source cross-encoders self-deployment options are optimized for speed
and can be signi cantly faster than some API-based solutions.
Customization: Can be ne-tuned on domain-speci c data for improved performance. It
should be noted that Cohere also allows you to ne-tune their reranker, and it only requires a
few hundred examples.

www.kx.com 21
Integrating reranking into
your search stack
To leverage reranking e ectively:
Start with a lightweight embedding model for initial retrieval.
Retrieve a larger set of potential matches (e.g. top 1000) using fast approximate nearest
neighbor search.
Apply a reranker to this subset to get the nal, highly relevant results.
This approach allows you to balance e ciency and accuracy, o en achieving be er results
than using a single, more complex embedding model alone.

www.kx.com 22
Selecting the right
embedding model: A
systematic approach

PART 4
Selecting the right embedding model:
a systematic approach
Choosing the optimal embedding model involves considering various factors and trade-o s.
Here's a step-by-step approach:
Know what you need
First things rst, let's gure out what you're actually looking for:
What's your main goal? Semantic search? Classi cation? Something else? The MTEB
leaderboard has di erent rankings for each task.
Are you dealing with legal jargon, medical terms, or just everyday language? If so, you
might be be er o using a domain-speci c model.
Need to handle multiple languages, or just one? If you are handling multiple languages,
then Cohere’s Multilingual Model might perform be er, especially for a clustering task.
How big is your data, and how many queries are you expecting? If you are working with a
lot of data, a smaller model will be much easier to work with. An alternative is to
compress your vectors, either by compressing before inserting into a vector database, or
using an index that incorporates compression.
How fast does it need to be? Are we talking milliseconds or can you a ord a few
seconds? If you are willing to endure slow search speeds, then it might make sense to get
the performance boost of a more powerful embedding model.

Look under the hood

Bigger isn't always be er. More dimensions mean more detail, but also more
computational muscle. It’s possible a smaller model can perform be er on your speci c
data, while greatly reducing latency.
Make sure the model can handle your text length. Sometimes it makes sense to embed
entire documents, instead of chunking them into parts. In that case, an embedding
model with a large context is necessary.
Check if the model's training data matches your domain. A model trained on tweets
might struggle with legal documents.

www.kx.com 24
Selecting the right embedding model:
a systematic approach
Is the model actively maintained? You should assume closed source providers will simply
stop o ering the embedding model you are considering in the next few years (or sooner),
which will force you to re-embed all your data with a newer version. Similarly, a trendy
embedding model near the top of the MTEB leaderboard might not have been tested in
production, for a host of reasons.

Face reality
Time to consider the practical stu :
What kind of hardware do you have? Some models need serious computing power.
Money ma ers. API calls can add up, and self-hosting isn't free either.
How sensitive is your data? Cloud solutions are convenient, but on-premise might be
necessary for some.
Think about how it'll t into your current setup. The best model in the world is useless if you
can't integrate it. Especially for open-source models, having an external API is useful not
only for production, but also testing.
Do your users need the best possible results? If you are o ering a free or fremium product,
then they might be more than satis ed with 'pre y-good' results. Engineering time spent
working with larger embedding models might be be er spent optimizing other parts of
the pipeline, such as improving citations, LLM responses, adding hybrid search, reranking,
or creating evals.

Test, test, test

Don't just take anyone's word for it:
Create a test dataset that looks like your real-world data.
Run some A/B tests. See how di erent models perform in a se ing close to your actual
use case.
Keep an eye on the important numbers. MRR, nDCG – these aren't just fancy acronyms,
they'll tell you what's working.

www.kx.com 25
Selecting the right embedding model:
a systematic approach
Fine-tune your choice
Once you've picked a model, you're not done yet. Consider these tweaks:
Fine-tuning: Teach an old model new tricks by adapting it to your speci c needs. This is
very e ective, but it’s important to remember that this custom model will then need to be
e ciently deployed.
Quantization: Trim the fat. Compress your vectors to save up memory in your vector
database and to improve retrieval speeds.
Distillation: Create a 'mini-me' of your model that's almost as smart but much faster.
Caching: Why calculate the same thing twice? Save those frequent yers. If you have a
search box, add some examples for the user to click, which can easily be cached.

www.kx.com 26
Strategies for evaluating
an embedding model

PART 5
Strategies for evaluating
an embedding model
Evaluating embedding models is crucial for ensuring that your chosen model performs well on
your speci c data and use case. Here are some strategies and best practices for conducting a
thorough evaluation:
Creating a representative test set
The rst step in evaluating embedding models is to create a test set that accurately
represents your data and use case. This test set should include:
A diverse range of documents that cover the breadth of your corpus
A set of realistic queries that users might ask
Relevance judgments that indicate which documents are relevant to each query

For generating queries and relevance judgments, you have several options:
Manual creation: Have domain experts create queries and judge document relevance.
This is time-consuming but can provide high-quality, domain-speci c evaluations.
LLM-assisted generation: Use a large language model (LLM) to generate queries based on
your documents. This can be an e cient way to create a large test set quickly. Long-
context models (LLMs that can process over 100k tokens or hundreds of pages of data) are
especially good at this task.
RAGAS Library: Utilize the RAGAS library, which provides tools for synthetic test set
generation.

www.kx.com 28
Strategies for evaluating
an embedding model
Here's an example of using an LLM (in this case, GPT-4) to generate queries and
relevance judgments:

import openai
import json

openai.api_key = 'OPENAI_API_KEY'

def generate_queries_and_judgments(documents, num_queries=5):

prompt = f"""
Given the following 30 documents covering various topics including animals, science,
history, literature, and technology, generate {num_queries} diverse queries. For each query,
provide the indices of the top 3 most relevant documents in order of relevance.

Documents:
{json.dumps(documents, indent=2)}

Output format:
[
{{
"query": "Generated query here",
"relevant_docs": [index1, index2, index3]
}},
...
]
Example output:
[
{{
"query": "What are some characteristics of big cats?",
"relevant_docs": [3, 0, 9]
}},
{{
"query": "Explain key concepts in modern physics",
"relevant_docs": [10, 11, 14]
}}
]
Ensure that each query has exactly 3 relevant document indices, ordered by relevance. Try
to cover a range of topics from the document set.
"""
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
)
return json.loads(response.choices[0].message.content)

www.kx.com 29
Strategies for evaluating
an embedding model
# Example usage
documents = [
"Cats are known for their independent nature and grooming habits.",
"Dogs are often called man's best friend due to their loyalty.",
"Elephants are the largest land animals and are known for their intelligence.",
"Lions are apex predators and are often called the kings of the jungle.",
"Dolphins are highly intelligent marine mammals known for their playful behavior.",
"Pandas are endangered bears native to China, known for eating bamboo.",
"Eagles are birds of prey with excellent eyesight and powerful talons.",
"Penguins are flightless birds adapted to life in the cold Antarctic waters.",
"Giraffes are the tallest land animals, known for their long necks.",
"Cheetahs are the fastest land animals, capable of short bursts of extreme speed.",
"The theory of relativity was proposed by Albert Einstein in the early 20th century.",
"Quantum mechanics describes the behavior of matter and energy at the atomic scale.",
"The periodic table organizes chemical elements based on their atomic structure.",
"Photosynthesis is the process by which plants convert sunlight into energy.",
"Gravity is the force of attraction between all masses in the universe.",
"The Renaissance was a period of cultural rebirth in Europe from the 14th to 17th
centuries.",
"The Industrial Revolution marked a shift from hand production to machine manufacturing.",
"World War II was a global conflict that lasted from 1939 to 1945.",
"The French Revolution was a period of social and political upheaval in France.",
"The Cold War was a period of geopolitical tension between the US and Soviet Union.",
"Shakespeare is considered one of the greatest playwrights in English literature.",
"The Odyssey is an ancient Greek epic poem attributed to Homer.",
"To Kill a Mockingbird is a novel by Harper Lee addressing racial injustice.",
"1984 is a dystopian novel by George Orwell about totalitarian control.",
"The Great Gatsby by F. Scott Fitzgerald explores the American Dream in the 1920s.",
"Python is a high-level programming language known for its simplicity and readability.",
"Machine learning is a subset of AI that focuses on creating learning algorithms.",
"Blockchain is a decentralized, distributed ledger technology.",
"Cloud computing delivers computing services over the internet.",
"Cybersecurity involves protecting systems and networks from digital attacks."
]

www.kx.com 30
Strategies for evaluating
an embedding model
query_results = generate_queries_and_judgments(documents)
def format_results(query_results, documents):
formatted_results = []
for result in query_results:
formatted_result = {
"query": result["query"],
"relevant_documents": [
{
"index": idx,
"content": documents[idx]
} for idx in result["relevant_docs"]
]
}
formatted_results.append(formatted_result)
return formatted_results

formatted_output = format_results(query_results, documents)

# Optionally Print the formatted results

# print(json.dumps(formatted_output, indent=2))

# Optionally, you can also print a more human-readable version:

print("\nHuman-readable format:")
for result in formatted_output:
print(f"\nQuery: {result['query']}")
print("Relevant documents:")
for doc in result['relevant_documents']:
print(f" {doc['index']}: {doc['content']}")

Query: What are some unique traits of aquatic animals?

Relevant documents:
4: Dolphins are highly intelligent marine mammals known for their playful behavior.
7: Penguins are ightless birds adapted to life in the cold Antarctic waters.
2: Elephants are the largest land animals and are known for their intelligence.

Query: Describe the impact of certain historical events

Relevant documents:
16: The Industrial Revolution marked a shi from hand production to machine
manufacturing.
17: World War II was a global con ict that lasted from 1939 to 1945.
18: The French Revolution was a period of social and political upheaval in France.

www.kx.com 31
Strategies for evaluating
an embedding model
Query: Overview of famous works in English literature
Relevant documents:
20: Shakespeare is considered one of the greatest playwrights in English literature.
22: To Kill a Mockingbird is a novel by Harper Lee addressing racial injustice.
23: 1984 is a dystopian novel by George Orwell about totalitarian control.

Query: Clarify core concepts in the eld of science

Relevant documents:
13: Photosynthesis is the process by which plants convert sunlight into energy.
12: The periodic table organizes chemical elements based on their atomic structure.
10: The theory of relativity was proposed by Albert Einstein in the early 20th century.
This script uses GPT-4 to generate queries and relevance judgments based on a given set of
documents. The output will be a list of queries, each with the indices of the top 3 most
relevant documents.

Implementing evaluation metrics

Once you have your test set, you'll need to implement relevant evaluation metrics.
Common metrics for evaluating embedding models include:
Normalized Discounted Cumulative Gain (nDCG): Measures the quality of ranking,
taking into account both relevance and position.
Mean Reciprocal Rank (MRR): Evaluates how well the model ranks the rst relevant
document.
Precision@k: Measures the proportion of relevant documents in the top k results.

www.kx.com 32
Strategies for evaluating
an embedding model
Here's a Python implementation of these metrics:

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer

def dcg_at_k(relevances, k):

"""Calculate Discounted Cumulative Gain at k"""
dcg = 0
for i in range(min(len(relevances), k)):
dcg += relevances[i] / np.log2(i + 2)
return dcg

def ndcg_at_k(predicted_indices, true_indices, k):

"""Calculate Normalized Discounted Cumulative Gain at k"""
predicted_relevances = [1 if idx in true_indices else 0 for idx in predicted_indices[:k]]
ideal_relevances = [1] * len(true_indices) + [0] * (k - len(true_indices))

dcg = dcg_at_k(predicted_relevances, k)
idcg = dcg_at_k(ideal_relevances, k)

return dcg / idcg if idcg > 0 else 0

def mean_reciprocal_rank(predicted_indices, true_indices):

"""Calculate Mean Reciprocal Rank"""
for i, idx in enumerate(predicted_indices):
if idx in true_indices:
return 1 / (i + 1)
return 0

def precision_at_k(predicted_indices, true_indices, k):

"""Calculate Precision at k"""
predicted_set = set(predicted_indices[:k])
true_set = set(true_indices)
return len(predicted_set.intersection(true_set)) / k

def evaluate_model(model_name, documents, queries):

model = SentenceTransformer(model_name)
doc_embeddings = model.encode(documents)

ndcg_scores = []
mrr_scores = []
precision_scores = []

www.kx.com 33
Strategies for evaluating
an embedding model

for query in queries:

query_embedding = model.encode([query['query']])
similarities = cosine_similarity(query_embedding, doc_embeddings)[0]
ranked_indices = np.argsort(similarities)[::-1]

ndcg_scores.append(ndcg_at_k(ranked_indices, query['relevant_docs'], k=3))

mrr_scores.append(mean_reciprocal_rank(ranked_indices, query['relevant_docs']))
precision_scores.append(precision_at_k(ranked_indices, query['relevant_docs'], k=3))

return {
'ndcg@3': np.mean(ndcg_scores),
'mrr': np.mean(mrr_scores),
'precision@3': np.mean(precision_scores)
}

# Example usage
model_name = "sentence-transformers/all-MiniLM-L6-v2"
results = evaluate_model(model_name, documents, queries_and_judgments)
print(f"Results for {model_name}:")
print(f"NDCG@3: {results['ndcg@3']:.4f}")
print(f"MRR: {results['mrr']:.4f}")
print(f"Precision@3: {results['precision@3']:.4f}")

Results for sentence-transformers/all-MiniLM-L6-v2:

NDCG@3: 0.7061
MRR: 0.8000
Precision@3: 0.7333

www.kx.com 34
Strategies for evaluating
an embedding model
Comparing multiple models
To get a comprehensive view of model performance, it's important to evaluate multiple
models. This allows you to compare their strengths and weaknesses in the context of your
speci c use case. Here's an example of how to evaluate and compare multiple models:

import matplotlib.pyplot as plt

models_to_evaluate = [
"sentence-transformers/all-MiniLM-L6-v2",
"sentence-transformers/all-mpnet-base-v2",
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"BAAI/bge-large-en-v1.5"
]
results = {}
for model_name in models_to_evaluate:
print(f"Evaluating {model_name}")
model_results = evaluate_model(model_name, documents, queries_and_judgments)
results[model_name] = model_results
print(f"NDCG@3: {model_results['ndcg@3']:.4f}")
print(f"MRR: {model_results['mrr']:.4f}")
print(f"Precision@3: {model_results['precision@3']:.4f}")
print()

# Visualize results
metrics = ['ndcg@3', 'mrr', 'precision@3']
x = np.arange(len(metrics))
width = 0.2
fig, ax = plt.subplots(figsize=(12, 6))

for i, (model_name, model_results) in enumerate(results.items()):

ax.bar(x + i*width, [model_results[m] for m in metrics], width, label=model_name)

ax.set_ylabel('Scores')
ax.set_title('Embedding Model Comparison')
ax.set_xticks(x + width * 1.5)
ax.set_xticklabels(metrics)
ax.legend(loc='lower right')
plt.tight_layout()
plt.show()

www.kx.com 35
Strategies for evaluating
an embedding model
This script generates a bar chart comparing the performance of di erent models across
the three metrics.

Using cross-encoders for approximate ordering

While having human-annotated relevance judgments is ideal, it's not always feasible,
especially for large datasets. In such cases, you can use cross-encoders or powerful
language models to generate approximate relevance orderings. The idea here is that no
ma er how good an embedding model, it will likely perform worse that a cross-
encoder/reranker.
We can even combine several strategies: create a RAG pipeline that uses a naive vector
search using our vector database, then reranks the top 1k results with a cross-encoder,
reranks the top 10 results with an LLM, and uses the top 10 results to calculate the NDCG
using the embedding model.
It’s important to note that while this more complex pipeline would likely perform much
be er, it might be too slow/expensive to use in practice.

www.kx.com 36
Strategies for evaluating
an embedding model
Here's an example using the Cohere Rerank API as a cross-encoder to generate approximate
relevance orderings:

import cohere
import json

co = cohere.Client('COHERE_API_KEY')

def get_relevant_docs(query, documents, top_k=3):

response = co.rerank(query=query, documents=documents, top_n=top_k, model='rerank-
english-v3.0')
return [
{
"index": result.index,
"relevance_score": result.relevance_score,
"content": documents[result.index]
} for result in response.results
]

queries = [
"What are some unique attributes of birds?",
"Describe some key events in European history",
"Can you provide information on well-known works of literature?",
"What are some facts about marine animals?",
"What are some areas in the field of technology?"
]

formatted_results = []
for q in queries:
try:
relevant_docs = get_relevant_docs(q, documents)
formatted_results.append({"query": q, "relevant_docs": relevant_docs})
except Exception as e:
print(f"Error processing query '{q}': {str(e)}")

# Print human-readable format

print("\nHuman-Readable Format:")
for result in formatted_results:
print(f"\nQuery: {result['query']}")
print("Relevant documents:")
for doc in result['relevant_docs']:
print(f" - Index: {doc['index']}")
print(f" Relevance Score: {doc['relevance_score']:.6f}")
print(f" Content: {doc['content']}")
print()

www.kx.com 37
Strategies for evaluating
an embedding model
Query: What are some unique a ributes of birds?
Relevant documents:
Index: 6
Relevance Score: 0.009020
Content: Eagles are birds of prey with excellent eyesight and powerful talons.
Index: 7
Relevance Score: 0.000915
Content: Penguins are ightless birds adapted to life in the cold Antarctic waters.
Index: 4
Relevance Score: 0.000291
Content: Dolphins are highly intelligent marine mammals known for their playful behavior.

Query: Describe some key events in European history

Relevant documents:
Index: 18
Relevance Score: 0.004102
Content: The French Revolution was a period of social and political upheaval in France.
Index: 15
Relevance Score: 0.003456
Content: The Renaissance was a period of cultural rebirth in Europe from the 14th to 17th
centuries.
Index: 16
Relevance Score: 0.001741
Content: The Industrial Revolution marked a shi from hand production to machine
manufacturing.

www.kx.com 38
Strategies for evaluating
an embedding model
Query: What are some areas in the eld of technology?
Relevant documents:
Index: 16
Relevance Score: 0.002183
Content: The Industrial Revolution marked a shi from hand production to machine
manufacturing.
Index: 26
Relevance Score: 0.001701
Content: Machine learning is a subset of AI that focuses on creating learning algorithms.
Index: 29
Relevance Score: 0.001623
Content: Cybersecurity involves protecting systems and networks from digital a acks.

You can then use this approximate ordering as the "ground truth" for your evaluation metrics
when human annotations are not available. You may notice there are some mistakes—a cross-
encoder will not give us the true ordering, but will in most cases be much be er than an
embedding model, so it may be something we can evaluate our embedding model against.

Considering runtime and resource usage

While performance metrics are crucial, it's also important to consider the practical aspects of
using an embedding model, such as inference time and resource usage.
Here's an example of how you might measure these factors:

www.kx.com 39
Strategies for evaluating
an embedding model
import time
import psutil
import torch

def measure_performance(model_name, documents, num_runs=5):

model = SentenceTransformer(model_name)

# Warm-up run
_ = model.encode(documents)
total_time = 0
max_memory = 0

for _ in range(num_runs):
start_time = time.time()
_ = model.encode(documents)
end_time = time.time()

total_time += end_time - start_time

max_memory = max(max_memory, psutil.virtual_memory().percent)

avg_time = total_time / num_runs

return {
'avg_inference_time': avg_time,
'max_memory_usage': max_memory,
'model_size': sum(p.numel() for p in model.parameters()) / 1e6 # Size in millions of
parameters
}

# Measure performance for each model

performance_results = {}
for model_name in models_to_evaluate:
performance_results[model_name] = measure_performance(model_name, documents)
print(f"Performance for {model_name}:")
print(f"Average Inference Time: {performance_results[model_name]['avg_inference_time']:.4f}
seconds")
print(f"Max Memory Usage: {performance_results[model_name]['max_memory_usage']:.2f}%")
print(f"Model Size: {performance_results[model_name]['model_size']:.2f}M parameters")
print()

Performance for sentence-transformers/all-MiniLM-L6-v2:

Average Inference Time: 0.4574 seconds
Max Memory Usage: 40.90%
Model Size: 22.71M parameters

This script measures the average inference time, maximum memory usage, and model size
for each embedding model. You can visualize these results alongside the performance
metrics to get a comprehensive view of each model's trade-o s.
By following these strategies and using the provided code snippets, you can conduct a
thorough evaluation of embedding models for your speci c use case. Remember to consider
both performance metrics and practical constraints when making your nal decision.

www.kx.com 40
Using KDB.AI for e cient evaluation
While our previous evaluation methods are e ective, they can become slow when dealing
with large datasets. This is where vector databases like KDB.AI can signi cantly speed up the
evaluation process, especially for similarity search tasks.
KDB.AI o ers a at index type, which is particularly well-suited for embedding model
evaluation. Here's why:
Exact search: The at index performs an exact nearest neighbor search, which is crucial
for accurate evaluation of embedding models.
Speed: Despite being an exact search, the at index in [KDB.AI](h p://kdb.ai/) is
optimized for performance, making it suitable for quick evaluations.
Simplicity: The at index is straightforward to set up and use, which is ideal for evaluation
scenarios where we want to focus on the embedding model's performance rather than
index complexity.
qFlat is the on-disk version of the Flat index, prioritizing capacity over speed. It's more cost-
e ective but slower than the in-memory Flat index. To use the on-disk version, simply specify
'qFlat' instead of 'Flat' when creating the index.

Sign up for KDB.AI

Get started quickly with vector databases and small generative AI projects.
Free version limited to 4GB memory
Multiple tables and indexes
Language model agnostic
Supported distance metrics: Euclidean, Dot Product and Cosine similarity
Metadata ltering
Developer resources
Slack Support
For more information visit trykdb.kx.com/kdbai/signup/

Let's walk through the process of using KDB.AI for embedding model evaluation:

www.kx.com 41
Using KDB.AI for e cient evaluation
First, we need to set up a connection to KDB.AI and create a table with the appropriate
schema:

import kdbai_client as kdbai

from sentence_transformers import SentenceTransformer

# Connect to KDB.AI
session = kdbai.Session(api_key='KDBAI_API_KEY', endpoint='KDBAI_ENDPOINT')

def setup_kdbai_table(table_name, vector_dim):

schema = {
'columns': [
{'name': 'id', 'pytype': 'str'},
{'name': 'content', 'pytype': 'str'},
{'name': 'vector', 'vectorIndex': {'dims': vector_dim, 'metric': 'L2', 'type':
'flat'}}
]
}
if table_name not in session.list():
session.create_table(table_name, schema)
return session.table(table_name)

# Usage
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
vector_dim = model.get_sentence_embedding_dimension()
table = setup_kdbai_table("eval_all-MiniLM-L6-v2", vector_dim)

Next, we need to insert our document embeddings into the KDB.AI table:

def insert_data(table, documents, model):

embeddings = model.encode(documents)

data = [
{
'id': str(i),
'content': doc,
'vector': embedding.tolist()
}
for i, (doc, embedding) in enumerate(zip(documents, embeddings))
]
table.insert(data)

# Usage
insert_data(table, documents, model)

www.kx.com 42
Using KDB.AI for e cient evaluation
Now that our data is in KDB.AI, we can perform similarity searches for each query and
calculate our evaluation metrics:

def evaluate_model_kdbai(model_name, documents, queries, k=3):

model = SentenceTransformer(model_name)
vector_dim = model.get_sentence_embedding_dimension()

# Setup KDB.AI table

table_name = f"eval_{model_name.replace('/', '_')}"
table = setup_kdbai_table(table_name, vector_dim)

# Insert document embeddings

insert_data(table, documents, model)

# Perform similarity search for each query

results = []
for query in queries:
query_vector = model.encode([query['query']])[0]
search_results = table.search(vectors=[query_vector.tolist()], n=k)
retrieved_indices = search_results[0]['id'].astype(int).tolist()
results.append({
'query': query['query'],
'retrieved': retrieved_indices,
'relevant': query['relevant_docs']
})

# Calculate metrics
ndcg_scores = [ndcg_at_k(result['retrieved'], result['relevant'], k) for result in results]
mrr_scores = [mean_reciprocal_rank(result['retrieved'], result['relevant']) for result in
results]
precision_scores = [precision_at_k(result['retrieved'], result['relevant'], k) for result
in results]

return {
'ndcg@k': np.mean(ndcg_scores),
'mrr': np.mean(mrr_scores),
f'precision@{k}': np.mean(precision_scores)
}

www.kx.com 43
Using KDB.AI for e cient evaluation
This approach allows us to leverage KDB.AI's e cient similarity search capabilities while
maintaining the exact search nature required for accurate evaluation. It's particularly
bene cial when evaluating models on larger datasets or when you need to perform repeated
evaluations quickly.
When using this method, you can easily compare di erent embedding models by creating
separate tables for each model in KDB.AI.
For example:

models_to_evaluate = [
"sentence-transformers/all-MiniLM-L6-v2",
"thenlper/gte-small",
"andersonbcdefg/bge-small-4096"
]

# Evaluation loop
results = {}
for model_name in models_to_evaluate:
print(f"\nEvaluating {model_name}")
results[model_name] = evaluate_model_kdbai(model_name, documents, queries_and_judgments)

# Print results
for model_name, result in results.items():
print(f"\nResults for {model_name}:")
print(f"NDCG@3: {result['ndcg@k']:.4f}")
print(f"MRR: {result['mrr']:.4f}")
print(f"Precision@3: {result['precision@3']:.4f}")

Results for sentence-transformers/all-MiniLM-L6-v2:

NDCG@3: 0.8000
MRR: 0.8000
Precision@3: 0.2667

Results for thenlper/gte-small:

NDCG@3: 0.7531
MRR: 0.9000
Precision@3: 0.7333

Results for andersonbcdefg/bge-small-4096:

NDCG@3: 0.6939
MRR: 0.9000
Precision@3: 0.6667

Important: These models all produce 384-dimensional vectors. If you evaluate models with
di erent vector sizes, you'll need to adjust the schema in your code accordingly.

www.kx.com 44
Using KDB.AI for e cient evaluation
This allows for quick switching between models during evaluation without the need to
recompute embeddings each time.
Remember, while the at index is excellent for evaluation purposes due to its exact search
nature, for production use cases with very large datasets, you might want to consider other
index types o ered by KDB.AI that provide a be er trade-o between search speed and
accuracy.
By incorporating KDB.AI into your embedding model evaluation pipeline, you can
signi cantly speed up the evaluation process, especially when dealing with larger datasets
or when you need to perform frequent evaluations of di erent models. The exact search
capabilities of the at index ensure that your evaluations are accurate, while KDB.AI's
optimized performance allows for quick iteration and comparison of di erent embedding
models.

www.kx.com 45
Key takeaways

PART 6
Key takeaways
As we've explored throughout this ebook, embedding models play a crucial role in modern
AI applications, serving as the foundation for tasks ranging from semantic search to multi-
modal understanding.
Diversity of option: From open-source powerhouses like the current MTEB leaders to
proprietary solutions like OpenAI's text-embedding-3-large, the embedding model
ecosystem o ers solutions for various needs and constraints.
Performance vs. practicality: While leaderboard-topping models showcase impressive
benchmarks, practical considerations like deployment ease, inference speed, and resource
requirements o en favor more modest but e cient models. It’s also important to note that
your data is likely very di erent from MTEB benchmarks.
Domain speci city ma ers: For specialized elds like nance, law, or multilingual
applications, domain-speci c models from providers like Voyage AI or Cohere can o er
signi cant advantages over general-purpose embeddings.
Open source momentum: The rapid advancement of open-source models, as evidenced by
the MTEB leaderboard, is democratizing access to high-quality embeddings and enabling
more exible, privacy-preserving implementations.
Continuous evaluation: Given the fast-paced nature of the eld, regular re-evaluation of
embedding model choices is crucial for maintaining optimal performance in production
systems.
Optimization: Large vectors can be compressed, which can speed up search. Some vector
databases take advantage of this with data structures like IVF-PQ, which compresses
vectors before building an index.

www.kx.com 47
Key takeaways
Looking ahead
As we move forward, several trends are likely to shape the future of embedding models:
Multimodal integration: Expect further advancements in models that can seamlessly
embed and relate di erent data types, building on the success of models like CLIP.
E ciency at scale: With the growing size of datasets and the need for real-time
applications, research into more e cient embedding techniques that maintain high
performance will intensify. Working with billions of vectors is still challenging.
Customization and ne-tuning: Tools and techniques for easily adapting pre-trained
embedding models to speci c domains or tasks will become more sophisticated and
accessible. Model quantization/vector compression will allow for using bulky models while
maintaining fast search speeds, and will be necessary for scale. This becomes especially
important once there are millions of vectors in a vector database.
The task of choosing the right embedding model isn’t easy. It requires a balance of technical
knowledge, practical experimentation, and an understanding of the speci c needs of your
application. By staying informed about the latest developments, critically evaluating options,
and continuously re ning your approach, you can harness the power of embedding models to
build more intelligent, e cient, and e ective AI systems.
Remember, the 'best' embedding model is not always the one with the highest benchmark
scores. It's the one that aligns most closely with your speci c use case, deployment constraints,
and performance requirements. As you embark on your embedding model journey, it’s
important to remember that a small model is o en more than good enough—especially if there
is a reranking step.

www.kx.com 48
The author
Michael Ryaboy is a Developer Advocate at KX specializing in
building search pipelines and fullstack AI applications. With four
years of experience in AI development, he has released a course
on vector databases and shipped AI products used by hundreds
of thousands of users. Michael shares content on Medium and
LinkedIn about improving search and RAG systems.
Connect with Michael on LinkedIn

About KX
Our mission is to accelerate data and AI-driven innovation with high performance analytics
solutions, enabling our customers to transform into AI- rst enterprises. KX is trusted by the
world's top investment banks & hedge funds, aerospace and defense, life and health
sciences, semiconductor, telecommunications, and advanced manufacturing companies.
Time series and vector data analytics and management are at the heart of our products,
independently benchmarked as the fastest on the market. They help our customers process
data at unmatched speed and scale and empower LOB leaders, developers, data scientists,
and data engineers to build high-performance data-driven applications and turbocharge
their favorite analytics tools in the cloud, on premise, or at the edge.
KX technology enables the discovery of richer, actionable insights for faster, be er informed
decision making which drives competitive advantage and transformative growth for our
customers. KX operates across North America, Europe, and Asia Paci c.
For more information visit www.kx.com

www.kx.com 49