0% found this document useful (0 votes)
11 views

Embeddings (Updated)

Uploaded by

alekhsaxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Embeddings (Updated)

Uploaded by

alekhsaxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Embeddings

Instructors
Prashant Sahu
Manager - Data Science, Analytics Vidhya
Ravi Theja
Developer Advocate Engineer, LlamaIndex
Why do we need Embeddings?
Indexing stage

Numerical vector representations of textual chunks


that capture the meaning and context of the text
Why do we need Embeddings?

User Query User Query


Embeddings
Retriever Top K Nodes LLM Response
Response Synthesis Module
Vector Store/DB
Retrieval Module
What are Embeddings?
Embedding model

Embedding model

Embedding model

Embedding model

Embeddings represent the text data in the numerical format


What are Embeddings?

They capture the semantic relationships in the language


Interpreting Embeddings
Cosine similarity is used to determine how similar two vectors are, regardless of their magnitude.
The value of cosine similarity ranges from -1 to 1, where 1 indicates that the vectors are identical, 0
indicates no similarity, and -1 indicates complete dissimilarity.

where A & B are the text embedding vectors of 2 different pieces of text
(words or phrases or document chunks)
Applications of Embeddings
1. Finding Most Similar Words
Word: "king"
Most Similar Words: ["queen", "monarch",
"prince", "ruler", "emperor"]
Applications of Embeddings
1. Finding Most Similar Words 2. Finding the Odd one out
Word: "king" Word: ["breakfast", "lunch", "dinner",
Most Similar Words: ["queen", "monarch", "car"]
"prince", "ruler", "emperor"] Odd One Out: "car"
cosine_similarity(breakfast, avg_vector_embed) = 0.954
cosine_similarity(lunch, avg_vector_embed) = 0.965
cosine_similarity(dinner, avg_vector_embed) = 0.963
cosine_similarity(car, avg_vector_embed) = 0.891
Applications Embeddings
3. Sentence Similarity
Sentence 1: "The cat sits on the mat."
Sentence 2: "A feline is sitting on a
rug."
Applications Embeddings
3. Sentence Similarity 4. Document Clustering
Cluster 1
Sentence 1: "The cat sits on the mat." "AI is transforming the tech industry."
Sentence 2: "A feline is sitting on a "The new AI model is impressive."
rug."
Cluster 2
"Climate change impacts the environment."
"Renewable energy is the future."
OpenAI Embeddings
Closed CohereAI Embeddings
source
Embeddings Google Gemini Embeddings
JinaAI Embeddings
BERT / DistilBERT
Open BGE
source
Embeddings mpnet
e5
How to select
the right 1 Look for domain specific embeddings
embeddings? 2 State-of-the-art embeddings
Massive Text Embedding Benchmark (MTEB)
How to select 1 Look for domain specific embeddings
the right 2 State-of-the-art embeddings
embeddings? 3 Finetune embeddings
Thank You

You might also like