Embeddings (Updated)
Embeddings (Updated)
Instructors
Prashant Sahu
Manager - Data Science, Analytics Vidhya
Ravi Theja
Developer Advocate Engineer, LlamaIndex
Why do we need Embeddings?
Indexing stage
Embedding model
Embedding model
Embedding model
where A & B are the text embedding vectors of 2 different pieces of text
(words or phrases or document chunks)
Applications of Embeddings
1. Finding Most Similar Words
Word: "king"
Most Similar Words: ["queen", "monarch",
"prince", "ruler", "emperor"]
Applications of Embeddings
1. Finding Most Similar Words 2. Finding the Odd one out
Word: "king" Word: ["breakfast", "lunch", "dinner",
Most Similar Words: ["queen", "monarch", "car"]
"prince", "ruler", "emperor"] Odd One Out: "car"
cosine_similarity(breakfast, avg_vector_embed) = 0.954
cosine_similarity(lunch, avg_vector_embed) = 0.965
cosine_similarity(dinner, avg_vector_embed) = 0.963
cosine_similarity(car, avg_vector_embed) = 0.891
Applications Embeddings
3. Sentence Similarity
Sentence 1: "The cat sits on the mat."
Sentence 2: "A feline is sitting on a
rug."
Applications Embeddings
3. Sentence Similarity 4. Document Clustering
Cluster 1
Sentence 1: "The cat sits on the mat." "AI is transforming the tech industry."
Sentence 2: "A feline is sitting on a "The new AI model is impressive."
rug."
Cluster 2
"Climate change impacts the environment."
"Renewable energy is the future."
OpenAI Embeddings
Closed CohereAI Embeddings
source
Embeddings Google Gemini Embeddings
JinaAI Embeddings
BERT / DistilBERT
Open BGE
source
Embeddings mpnet
e5
How to select
the right 1 Look for domain specific embeddings
embeddings? 2 State-of-the-art embeddings
Massive Text Embedding Benchmark (MTEB)
How to select 1 Look for domain specific embeddings
the right 2 State-of-the-art embeddings
embeddings? 3 Finetune embeddings
Thank You