SlideShare a Scribd company logo
Embeddings
Presented by
Featured Speaker
Jocelyn Matthews
Head of Community, Pinecone
Presented by
jocelyn@pinecone.io
3
What are embeddings?
Embeddings are numerical representations that capture the essential
features and relationships of discrete objects, like words or documents,
in a continuous vector space.
Embeddings:
● Are dynamic and context-sensitive.
● Capture the essence of the data they represent
● Are influenced by the context in which they are used
● Adaptability makes them powerful
Humans think in sensations, words, ideas.
Computers think in numbers
You don’t need to memorize this now
Vector: a list of numbers that tell us about something
Vector space: an environment in which vectors exist
Semantics: the study of meaning communicated through language
Vectors
A vector is a mathematical structure
with a size and a direction. For
example, we can think of the vector
as a point in space, with the
“direction” being an arrow from
(0,0,0) to that point in the vector
space.
Vectors
As developers, it might be easier to
think of a vector as an array
containing numerical values. For
example:
vector = [0,-2,...4]
Vectors
When we look at a bunch of vectors
in one space, we can say that some
are closer to one another, while
others are far apart. Some vectors
can seem to cluster together, while
others could be sparsely distributed
in the space.
An example you can bank on
🏦 Where is the Bank of England?
🌱 Where is the grassy bank?​
🛩️ How does a plane bank?
🐝 “the bees decided to have a mutiny against their queen”
🐝 “flying stinging insects rebelled in opposition to the
matriarch”
Polysemy and homonyms
Embeddings visualized
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthews, Pinecone
Owning the concepts
Word arithmetic
king – man + woman = queen
Image, Peter Sutor, “Metaconcepts: Isolating Context in Word Embeddings”
Word arithmetic
king – man + woman = queen
“Distributed Representations of Words and Phrases and their Compositionality”
Word arithmetic
king – man + woman = queen
“adding the vectors associated with the words king
and woman while subtracting man is equal to the
vector associated with queen. This describes a gender
relationship.”
– MIT Technology Review, 2015
Word arithmetic
Paris - France + Poland = Warsaw
Word arithmetic
Paris - France + Poland = Warsaw
“In this case, the vector difference between Paris and
France captures the concept of capital city.”
– MIT Technology Review, 2015
Proximity
Together and apart
Coffee
Hospital
Music
Restaurant
School
Together and apart
Coffee
Hospital
Music
Restaurant
School
Cup
Caffeine
Morning
Galaxy
Dinosaur
Doctor
Patient
Surgery
Volcano
Unicorn
Song
Melody
Instrument
Asteroid
Bacteria
Food
Menu
Waiter
Nebula
Dragon
Teacher
Classroom
Student
Volcano
Spaceship
Exam
Dimensionality!
Coffee
Hospital
Music
Restaurant
School
Cup
Caffeine
Morning
Galaxy
Dinosaur
Doctor
Patient
Surgery
Volcano
Unicorn
Song
Melody
Instrument
Asteroid
Bacteria
Food
Menu
Waiter
Nebula
Dragon
Teacher
Classroom
Student
Volcano
Spaceship
Exam
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthews, Pinecone
Green is to blue
green blue
As orange is to…
green
orange
blue
As orange is to…yep!
green
orange
blue
red
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthews, Pinecone
What’s The Fallacy?
Why "Green : Blue :: Orange : Red" is Imperfect as a Teaching Tool
• Simplicity of relationships
• Linear vs nuanced
• Lack of Context
• How are the words used?
• Dimensionality
• 3D vs 100s of D
• Oversimplification
What’s The Fallacy?
Why "Green : Blue :: Orange : Red" is Imperfect as a Teaching Tool
• Simplicity of relationships
• Linear vs nuanced
• Lack of Context
• How are the words used?
• Dimensionality
• 3D vs 100s of D
• Oversimplification
What’s The Fallacy?
Why "Green : Blue :: Orange : Red" is Imperfect as a Teaching Tool
• Simplicity of relationships
• Linear vs nuanced
• Lack of Context
• How are the words used?
• Dimensionality
• 3D vs 100s of D
• Oversimplification
What’s The Fallacy?
Why "Green : Blue :: Orange : Red" is Imperfect as a Teaching Tool
• Simplicity of relationships
• Linear vs nuanced
• Lack of Context
• How are the words used?
• Dimensionality
• 3D vs 100s of D
• Oversimplification
Life is Like a Box of…
(Or, ”Check the Vectors”)
Check the vectors
The distance between red and orange
is incredibly similar to blue and green…
But when we tested things trying to verify,
we got interesting results which show the
"understanding" of the relationship
This actually yields this
# Find a term that has the same distance and direction blue has from green, but starting from
blue
target_distance = distance_green_blue
target_direction = direction_green_blue
# Define a list of terms to compare
terms = ["red", "orange", "yellow", "green", "blue", "purple", "pink", "black", "white", "gray"]
# Get the embedding for each term
term_embeddings = {term: get_embedding(term) for term in terms}
# Find the term with the closest distance and same direction to the target distance and direction
closest_term = None
closest_distance = float('inf')
start_term = "red"
start_embedding = get_embedding(start_term)
for term, embedding in term_embeddings.items():
if term == start_term:
continue
distance, direction = cosine_distance_and_direction(start_embedding, embedding)
if direction == target_direction and abs(distance - target_distance) < closest_distance:
closest_distance = abs(distance - target_distance)
closest_term = term
closest_term, closest_distance
Check the vectors
The distance between red and orange
is incredibly similar to blue and green…
But when we played around to verify, we
got interesting results revealing the
semantic "understanding" of the
relationship
This actually yields this
# Find a term that has the same distance and direction blue
has from green, but starting from blue
target_distance = distance_green_blue
target_direction = direction_green_blue
# Define a list of terms to compare
terms = ["red", "orange", "yellow", "green", "blue",
"purple", "pink", "black", "white", "gray"]
# Get the embedding for each term
term_embeddings = {term: get_embedding(term) for term in
terms}
# Find the term with the closest distance and same direction
to the target distance and direction
closest_term = None
closest_distance = float('inf')
start_term = "red"
start_embedding = get_embedding(start_term)
for term, embedding in term_embeddings.items():
if term == start_term:
continue
distance, direction =
cosine_distance_and_direction(start_embedding, embedding)
if direction == target_direction and abs(distance -
target_distance) < closest_distance:
closest_distance = abs(distance - target_distance)
closest_term = term
closest_term, closest_distance
('purple', np.float64(0.006596347059928065))
Purple
Why not 'orange'?
The code's result of ('purple',
np.float64(0.006596347059928065))
suggests that, in the embedding space used by
the model, "red" and "purple" have a closer
semantic relationship than "red" and "orange".
The embedding model used in the code has
determined that "red" and "purple" are closer
semantically. This is likely due to the specific
contexts and relationships captured by the model
during training.
It yields 'purple' instead of orange because the
cosine distance and direction calculations
between the embeddings of "red" and other terms
result in "purple" being the closest match to the
target distance and direction from "green" to
"blue".
# Find a term that has the same distance and direction blue
has from green, but starting from blue
target_distance = distance_green_blue
target_direction = direction_green_blue
# Define a list of terms to compare
terms = ["red", "orange", "yellow", "green", "blue",
"purple", "pink", "black", "white", "gray"]
# Get the embedding for each term
term_embeddings = {term: get_embedding(term) for term in
terms}
# Find the term with the closest distance and same direction
to the target distance and direction
closest_term = None
closest_distance = float('inf')
start_term = "red"
start_embedding = get_embedding(start_term)
for term, embedding in term_embeddings.items():
if term == start_term:
continue
distance, direction =
cosine_distance_and_direction(start_embedding, embedding)
if direction == target_direction and abs(distance -
target_distance) < closest_distance:
closest_distance = abs(distance - target_distance)
closest_term = term
closest_term, closest_distance
Embeddings
TL;DR
What are embeddings?
Embeddings are numerical representations that capture the essential
features and relationships of discrete objects, like words or documents,
in a continuous vector space.
The most important thing to understand
Embeddings are numerical representations of data that:
capture semantic meaning
and
allow for efficient comparison of similarity.
Key points about embeddings
1. They can represent various data types, not just text.
2. Dimensionality
3. Context sensitivity affects interpretation and application.
Applications of embeddings include:
- Semantic search
- Question-answering applications
- Image search
- Audio search
- Recommender systems
- Anomaly detection
“Generate your own embeddings”
(Inference API)
Sample app
Legal Semantic Search
Sample app
Shop the Look
© 2024 Pinecone – All rights reserved 45
1. Questions?
#hallwaytrack
2. Recording?
YouTube!
3. Slides?
Ask me
Thank you!
jocelyn@pinecone.io
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthews, Pinecone
© 2024 Pinecone – All rights reserved 48
Ad

More Related Content

Similar to apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthews, Pinecone (20)

Aron chpt 3 correlation
Aron chpt 3 correlationAron chpt 3 correlation
Aron chpt 3 correlation
Karen Price
 
data handling class 8
data handling class 8data handling class 8
data handling class 8
HimakshiKava
 
Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?
Ken Plummer
 
L05 word representation
L05 word representationL05 word representation
L05 word representation
ananth
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Abebe Admasu
 
2 UNIT-DSP.pptx
2 UNIT-DSP.pptx2 UNIT-DSP.pptx
2 UNIT-DSP.pptx
PothyeswariPothyes
 
Words in space
Words in spaceWords in space
Words in space
Rebecca Bilbro
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
dinesh_joshy
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
University of Minnesota, Duluth
 
Math class 8 data handling
Math class 8 data handling Math class 8 data handling
Math class 8 data handling
HimakshiKava
 
Multidimensional scaling1
Multidimensional scaling1Multidimensional scaling1
Multidimensional scaling1
Carlo Magno
 
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedBasic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
bob panic
 
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical DomainDDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
LuukBoulogne
 
Aron chpt 3 correlation compatability version f2011
Aron chpt 3 correlation compatability version f2011Aron chpt 3 correlation compatability version f2011
Aron chpt 3 correlation compatability version f2011
Sandra Nicks
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
Abhinav Gupta
 
Ilja state2014expressivity
Ilja state2014expressivityIlja state2014expressivity
Ilja state2014expressivity
maartenmarx
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
fridolin.wild
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)
Abhimanyu Dwivedi
 
Unit 7 jcs md
Unit 7 jcs mdUnit 7 jcs md
Unit 7 jcs md
jcsmathfoundations
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
connectbeubax
 
Aron chpt 3 correlation
Aron chpt 3 correlationAron chpt 3 correlation
Aron chpt 3 correlation
Karen Price
 
data handling class 8
data handling class 8data handling class 8
data handling class 8
HimakshiKava
 
Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?
Ken Plummer
 
L05 word representation
L05 word representationL05 word representation
L05 word representation
ananth
 
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Google BigQuery is a very popular enterprise warehouse that’s built with a co...
Abebe Admasu
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
dinesh_joshy
 
Math class 8 data handling
Math class 8 data handling Math class 8 data handling
Math class 8 data handling
HimakshiKava
 
Multidimensional scaling1
Multidimensional scaling1Multidimensional scaling1
Multidimensional scaling1
Carlo Magno
 
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedBasic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
bob panic
 
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical DomainDDH 2021-03-03: Text Processing and Searching in the Medical Domain
DDH 2021-03-03: Text Processing and Searching in the Medical Domain
LuukBoulogne
 
Aron chpt 3 correlation compatability version f2011
Aron chpt 3 correlation compatability version f2011Aron chpt 3 correlation compatability version f2011
Aron chpt 3 correlation compatability version f2011
Sandra Nicks
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
Abhinav Gupta
 
Ilja state2014expressivity
Ilja state2014expressivityIlja state2014expressivity
Ilja state2014expressivity
maartenmarx
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
fridolin.wild
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)
Abhimanyu Dwivedi
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
connectbeubax
 

More from apidays (20)

apidays Paris 2024 - AI-Enhanced API Documentation Bridging Knowledge Gaps an...
apidays Paris 2024 - AI-Enhanced API Documentation Bridging Knowledge Gaps an...apidays Paris 2024 - AI-Enhanced API Documentation Bridging Knowledge Gaps an...
apidays Paris 2024 - AI-Enhanced API Documentation Bridging Knowledge Gaps an...
apidays
 
apidays Paris 2024 - AI Adoption - Engaging Minds, Developing Skills, Deliver...
apidays Paris 2024 - AI Adoption - Engaging Minds, Developing Skills, Deliver...apidays Paris 2024 - AI Adoption - Engaging Minds, Developing Skills, Deliver...
apidays Paris 2024 - AI Adoption - Engaging Minds, Developing Skills, Deliver...
apidays
 
apidays Paris 2024 - Design, Debug, Test and Mock APIs with Kong Insomnia, Pi...
apidays Paris 2024 - Design, Debug, Test and Mock APIs with Kong Insomnia, Pi...apidays Paris 2024 - Design, Debug, Test and Mock APIs with Kong Insomnia, Pi...
apidays Paris 2024 - Design, Debug, Test and Mock APIs with Kong Insomnia, Pi...
apidays
 
apidays Paris 2024 - Generate OpenAPI Schema from Golang Code with the Fuego ...
apidays Paris 2024 - Generate OpenAPI Schema from Golang Code with the Fuego ...apidays Paris 2024 - Generate OpenAPI Schema from Golang Code with the Fuego ...
apidays Paris 2024 - Generate OpenAPI Schema from Golang Code with the Fuego ...
apidays
 
Green IO Conference at apidays Paris 2024 - Meeting or Missing Targets? Data ...
Green IO Conference at apidays Paris 2024 - Meeting or Missing Targets? Data ...Green IO Conference at apidays Paris 2024 - Meeting or Missing Targets? Data ...
Green IO Conference at apidays Paris 2024 - Meeting or Missing Targets? Data ...
apidays
 
apidays Paris 2024 - Project API Forge, Naresh Jain, Specmatic
apidays Paris 2024 - Project API Forge, Naresh Jain, Specmaticapidays Paris 2024 - Project API Forge, Naresh Jain, Specmatic
apidays Paris 2024 - Project API Forge, Naresh Jain, Specmatic
apidays
 
apidays Paris 2024 - How API Help to Better Serve Clients at Allianz Trade, M...
apidays Paris 2024 - How API Help to Better Serve Clients at Allianz Trade, M...apidays Paris 2024 - How API Help to Better Serve Clients at Allianz Trade, M...
apidays Paris 2024 - How API Help to Better Serve Clients at Allianz Trade, M...
apidays
 
Green IO Conference at apidays Paris 2024 - TOSS In Some AI, Pindy Bhullar, UBS
Green IO Conference at apidays Paris 2024 - TOSS In Some AI, Pindy Bhullar, UBSGreen IO Conference at apidays Paris 2024 - TOSS In Some AI, Pindy Bhullar, UBS
Green IO Conference at apidays Paris 2024 - TOSS In Some AI, Pindy Bhullar, UBS
apidays
 
Green IO Conference at apidays Paris 2024 - What's Behind Avoided Emissions, ...
Green IO Conference at apidays Paris 2024 - What's Behind Avoided Emissions, ...Green IO Conference at apidays Paris 2024 - What's Behind Avoided Emissions, ...
Green IO Conference at apidays Paris 2024 - What's Behind Avoided Emissions, ...
apidays
 
apidays Paris 2024 - Governance of Event-Driven Distributed Systems, Omid Ei...
apidays Paris 2024 - Governance of Event-Driven Distributed Systems,  Omid Ei...apidays Paris 2024 - Governance of Event-Driven Distributed Systems,  Omid Ei...
apidays Paris 2024 - Governance of Event-Driven Distributed Systems, Omid Ei...
apidays
 
apidays Paris 2024 - Contract-Driven Development for Event-Driven Architectur...
apidays Paris 2024 - Contract-Driven Development for Event-Driven Architectur...apidays Paris 2024 - Contract-Driven Development for Event-Driven Architectur...
apidays Paris 2024 - Contract-Driven Development for Event-Driven Architectur...
apidays
 
apidays Paris 2024 - 7 Rules for Crafting Developer-Friendly API Libraries, B...
apidays Paris 2024 - 7 Rules for Crafting Developer-Friendly API Libraries, B...apidays Paris 2024 - 7 Rules for Crafting Developer-Friendly API Libraries, B...
apidays Paris 2024 - 7 Rules for Crafting Developer-Friendly API Libraries, B...
apidays
 
apidays Paris 2024 - Tackling OpenAPI Drift, Ikenna Nwaiwu, Ikenna Consulting
apidays Paris 2024 - Tackling OpenAPI Drift, Ikenna Nwaiwu, Ikenna Consultingapidays Paris 2024 - Tackling OpenAPI Drift, Ikenna Nwaiwu, Ikenna Consulting
apidays Paris 2024 - Tackling OpenAPI Drift, Ikenna Nwaiwu, Ikenna Consulting
apidays
 
apidays Paris 2024 - Advanced API Management and Operations A Develope
apidays Paris 2024 - Advanced API Management and Operations A Developeapidays Paris 2024 - Advanced API Management and Operations A Develope
apidays Paris 2024 - Advanced API Management and Operations A Develope
apidays
 
apidays Paris 2024 - Develop in Parallel and Ship Earlier with OpenAPI and Mo...
apidays Paris 2024 - Develop in Parallel and Ship Earlier with OpenAPI and Mo...apidays Paris 2024 - Develop in Parallel and Ship Earlier with OpenAPI and Mo...
apidays Paris 2024 - Develop in Parallel and Ship Earlier with OpenAPI and Mo...
apidays
 
apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaqu...
apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaqu...apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaqu...
apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaqu...
apidays
 
apidays Paris 2024 - Modularizing your API with Domain Storytelling Henning S...
apidays Paris 2024 - Modularizing your API with Domain Storytelling Henning S...apidays Paris 2024 - Modularizing your API with Domain Storytelling Henning S...
apidays Paris 2024 - Modularizing your API with Domain Storytelling Henning S...
apidays
 
apidays Paris 2024 - Establishing a Specification Framework for API Managemen...
apidays Paris 2024 - Establishing a Specification Framework for API Managemen...apidays Paris 2024 - Establishing a Specification Framework for API Managemen...
apidays Paris 2024 - Establishing a Specification Framework for API Managemen...
apidays
 
Green IO Conference at apidays Paris 2024 - The Internet is Getting Emptier;...
Green IO Conference at apidays Paris 2024 -  The Internet is Getting Emptier;...Green IO Conference at apidays Paris 2024 -  The Internet is Getting Emptier;...
Green IO Conference at apidays Paris 2024 - The Internet is Getting Emptier;...
apidays
 
Green IO Conference at apidays Paris 2024 - Eco-Designing a B2B App : the Bet...
Green IO Conference at apidays Paris 2024 - Eco-Designing a B2B App : the Bet...Green IO Conference at apidays Paris 2024 - Eco-Designing a B2B App : the Bet...
Green IO Conference at apidays Paris 2024 - Eco-Designing a B2B App : the Bet...
apidays
 
apidays Paris 2024 - AI-Enhanced API Documentation Bridging Knowledge Gaps an...
apidays Paris 2024 - AI-Enhanced API Documentation Bridging Knowledge Gaps an...apidays Paris 2024 - AI-Enhanced API Documentation Bridging Knowledge Gaps an...
apidays Paris 2024 - AI-Enhanced API Documentation Bridging Knowledge Gaps an...
apidays
 
apidays Paris 2024 - AI Adoption - Engaging Minds, Developing Skills, Deliver...
apidays Paris 2024 - AI Adoption - Engaging Minds, Developing Skills, Deliver...apidays Paris 2024 - AI Adoption - Engaging Minds, Developing Skills, Deliver...
apidays Paris 2024 - AI Adoption - Engaging Minds, Developing Skills, Deliver...
apidays
 
apidays Paris 2024 - Design, Debug, Test and Mock APIs with Kong Insomnia, Pi...
apidays Paris 2024 - Design, Debug, Test and Mock APIs with Kong Insomnia, Pi...apidays Paris 2024 - Design, Debug, Test and Mock APIs with Kong Insomnia, Pi...
apidays Paris 2024 - Design, Debug, Test and Mock APIs with Kong Insomnia, Pi...
apidays
 
apidays Paris 2024 - Generate OpenAPI Schema from Golang Code with the Fuego ...
apidays Paris 2024 - Generate OpenAPI Schema from Golang Code with the Fuego ...apidays Paris 2024 - Generate OpenAPI Schema from Golang Code with the Fuego ...
apidays Paris 2024 - Generate OpenAPI Schema from Golang Code with the Fuego ...
apidays
 
Green IO Conference at apidays Paris 2024 - Meeting or Missing Targets? Data ...
Green IO Conference at apidays Paris 2024 - Meeting or Missing Targets? Data ...Green IO Conference at apidays Paris 2024 - Meeting or Missing Targets? Data ...
Green IO Conference at apidays Paris 2024 - Meeting or Missing Targets? Data ...
apidays
 
apidays Paris 2024 - Project API Forge, Naresh Jain, Specmatic
apidays Paris 2024 - Project API Forge, Naresh Jain, Specmaticapidays Paris 2024 - Project API Forge, Naresh Jain, Specmatic
apidays Paris 2024 - Project API Forge, Naresh Jain, Specmatic
apidays
 
apidays Paris 2024 - How API Help to Better Serve Clients at Allianz Trade, M...
apidays Paris 2024 - How API Help to Better Serve Clients at Allianz Trade, M...apidays Paris 2024 - How API Help to Better Serve Clients at Allianz Trade, M...
apidays Paris 2024 - How API Help to Better Serve Clients at Allianz Trade, M...
apidays
 
Green IO Conference at apidays Paris 2024 - TOSS In Some AI, Pindy Bhullar, UBS
Green IO Conference at apidays Paris 2024 - TOSS In Some AI, Pindy Bhullar, UBSGreen IO Conference at apidays Paris 2024 - TOSS In Some AI, Pindy Bhullar, UBS
Green IO Conference at apidays Paris 2024 - TOSS In Some AI, Pindy Bhullar, UBS
apidays
 
Green IO Conference at apidays Paris 2024 - What's Behind Avoided Emissions, ...
Green IO Conference at apidays Paris 2024 - What's Behind Avoided Emissions, ...Green IO Conference at apidays Paris 2024 - What's Behind Avoided Emissions, ...
Green IO Conference at apidays Paris 2024 - What's Behind Avoided Emissions, ...
apidays
 
apidays Paris 2024 - Governance of Event-Driven Distributed Systems, Omid Ei...
apidays Paris 2024 - Governance of Event-Driven Distributed Systems,  Omid Ei...apidays Paris 2024 - Governance of Event-Driven Distributed Systems,  Omid Ei...
apidays Paris 2024 - Governance of Event-Driven Distributed Systems, Omid Ei...
apidays
 
apidays Paris 2024 - Contract-Driven Development for Event-Driven Architectur...
apidays Paris 2024 - Contract-Driven Development for Event-Driven Architectur...apidays Paris 2024 - Contract-Driven Development for Event-Driven Architectur...
apidays Paris 2024 - Contract-Driven Development for Event-Driven Architectur...
apidays
 
apidays Paris 2024 - 7 Rules for Crafting Developer-Friendly API Libraries, B...
apidays Paris 2024 - 7 Rules for Crafting Developer-Friendly API Libraries, B...apidays Paris 2024 - 7 Rules for Crafting Developer-Friendly API Libraries, B...
apidays Paris 2024 - 7 Rules for Crafting Developer-Friendly API Libraries, B...
apidays
 
apidays Paris 2024 - Tackling OpenAPI Drift, Ikenna Nwaiwu, Ikenna Consulting
apidays Paris 2024 - Tackling OpenAPI Drift, Ikenna Nwaiwu, Ikenna Consultingapidays Paris 2024 - Tackling OpenAPI Drift, Ikenna Nwaiwu, Ikenna Consulting
apidays Paris 2024 - Tackling OpenAPI Drift, Ikenna Nwaiwu, Ikenna Consulting
apidays
 
apidays Paris 2024 - Advanced API Management and Operations A Develope
apidays Paris 2024 - Advanced API Management and Operations A Developeapidays Paris 2024 - Advanced API Management and Operations A Develope
apidays Paris 2024 - Advanced API Management and Operations A Develope
apidays
 
apidays Paris 2024 - Develop in Parallel and Ship Earlier with OpenAPI and Mo...
apidays Paris 2024 - Develop in Parallel and Ship Earlier with OpenAPI and Mo...apidays Paris 2024 - Develop in Parallel and Ship Earlier with OpenAPI and Mo...
apidays Paris 2024 - Develop in Parallel and Ship Earlier with OpenAPI and Mo...
apidays
 
apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaqu...
apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaqu...apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaqu...
apidays Paris 2024 - Make Your LLM Infrastructure Serverless, Guillaume Blaqu...
apidays
 
apidays Paris 2024 - Modularizing your API with Domain Storytelling Henning S...
apidays Paris 2024 - Modularizing your API with Domain Storytelling Henning S...apidays Paris 2024 - Modularizing your API with Domain Storytelling Henning S...
apidays Paris 2024 - Modularizing your API with Domain Storytelling Henning S...
apidays
 
apidays Paris 2024 - Establishing a Specification Framework for API Managemen...
apidays Paris 2024 - Establishing a Specification Framework for API Managemen...apidays Paris 2024 - Establishing a Specification Framework for API Managemen...
apidays Paris 2024 - Establishing a Specification Framework for API Managemen...
apidays
 
Green IO Conference at apidays Paris 2024 - The Internet is Getting Emptier;...
Green IO Conference at apidays Paris 2024 -  The Internet is Getting Emptier;...Green IO Conference at apidays Paris 2024 -  The Internet is Getting Emptier;...
Green IO Conference at apidays Paris 2024 - The Internet is Getting Emptier;...
apidays
 
Green IO Conference at apidays Paris 2024 - Eco-Designing a B2B App : the Bet...
Green IO Conference at apidays Paris 2024 - Eco-Designing a B2B App : the Bet...Green IO Conference at apidays Paris 2024 - Eco-Designing a B2B App : the Bet...
Green IO Conference at apidays Paris 2024 - Eco-Designing a B2B App : the Bet...
apidays
 
Ad

Recently uploaded (20)

CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Ad

apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthews, Pinecone

  • 2. Featured Speaker Jocelyn Matthews Head of Community, Pinecone Presented by [email protected] 3
  • 3. What are embeddings? Embeddings are numerical representations that capture the essential features and relationships of discrete objects, like words or documents, in a continuous vector space.
  • 4. Embeddings: ● Are dynamic and context-sensitive. ● Capture the essence of the data they represent ● Are influenced by the context in which they are used ● Adaptability makes them powerful Humans think in sensations, words, ideas. Computers think in numbers
  • 5. You don’t need to memorize this now Vector: a list of numbers that tell us about something Vector space: an environment in which vectors exist Semantics: the study of meaning communicated through language
  • 6. Vectors A vector is a mathematical structure with a size and a direction. For example, we can think of the vector as a point in space, with the “direction” being an arrow from (0,0,0) to that point in the vector space.
  • 7. Vectors As developers, it might be easier to think of a vector as an array containing numerical values. For example: vector = [0,-2,...4]
  • 8. Vectors When we look at a bunch of vectors in one space, we can say that some are closer to one another, while others are far apart. Some vectors can seem to cluster together, while others could be sparsely distributed in the space.
  • 9. An example you can bank on 🏦 Where is the Bank of England? 🌱 Where is the grassy bank?​ 🛩️ How does a plane bank? 🐝 “the bees decided to have a mutiny against their queen” 🐝 “flying stinging insects rebelled in opposition to the matriarch”
  • 14. Word arithmetic king – man + woman = queen Image, Peter Sutor, “Metaconcepts: Isolating Context in Word Embeddings”
  • 15. Word arithmetic king – man + woman = queen “Distributed Representations of Words and Phrases and their Compositionality”
  • 16. Word arithmetic king – man + woman = queen “adding the vectors associated with the words king and woman while subtracting man is equal to the vector associated with queen. This describes a gender relationship.” – MIT Technology Review, 2015
  • 17. Word arithmetic Paris - France + Poland = Warsaw
  • 18. Word arithmetic Paris - France + Poland = Warsaw “In this case, the vector difference between Paris and France captures the concept of capital city.” – MIT Technology Review, 2015
  • 24. Green is to blue green blue
  • 25. As orange is to… green orange blue
  • 26. As orange is to…yep! green orange blue red
  • 28. What’s The Fallacy? Why "Green : Blue :: Orange : Red" is Imperfect as a Teaching Tool • Simplicity of relationships • Linear vs nuanced • Lack of Context • How are the words used? • Dimensionality • 3D vs 100s of D • Oversimplification
  • 29. What’s The Fallacy? Why "Green : Blue :: Orange : Red" is Imperfect as a Teaching Tool • Simplicity of relationships • Linear vs nuanced • Lack of Context • How are the words used? • Dimensionality • 3D vs 100s of D • Oversimplification
  • 30. What’s The Fallacy? Why "Green : Blue :: Orange : Red" is Imperfect as a Teaching Tool • Simplicity of relationships • Linear vs nuanced • Lack of Context • How are the words used? • Dimensionality • 3D vs 100s of D • Oversimplification
  • 31. What’s The Fallacy? Why "Green : Blue :: Orange : Red" is Imperfect as a Teaching Tool • Simplicity of relationships • Linear vs nuanced • Lack of Context • How are the words used? • Dimensionality • 3D vs 100s of D • Oversimplification
  • 32. Life is Like a Box of… (Or, ”Check the Vectors”)
  • 33. Check the vectors The distance between red and orange is incredibly similar to blue and green… But when we tested things trying to verify, we got interesting results which show the "understanding" of the relationship This actually yields this # Find a term that has the same distance and direction blue has from green, but starting from blue target_distance = distance_green_blue target_direction = direction_green_blue # Define a list of terms to compare terms = ["red", "orange", "yellow", "green", "blue", "purple", "pink", "black", "white", "gray"] # Get the embedding for each term term_embeddings = {term: get_embedding(term) for term in terms} # Find the term with the closest distance and same direction to the target distance and direction closest_term = None closest_distance = float('inf') start_term = "red" start_embedding = get_embedding(start_term) for term, embedding in term_embeddings.items(): if term == start_term: continue distance, direction = cosine_distance_and_direction(start_embedding, embedding) if direction == target_direction and abs(distance - target_distance) < closest_distance: closest_distance = abs(distance - target_distance) closest_term = term closest_term, closest_distance
  • 34. Check the vectors The distance between red and orange is incredibly similar to blue and green… But when we played around to verify, we got interesting results revealing the semantic "understanding" of the relationship This actually yields this # Find a term that has the same distance and direction blue has from green, but starting from blue target_distance = distance_green_blue target_direction = direction_green_blue # Define a list of terms to compare terms = ["red", "orange", "yellow", "green", "blue", "purple", "pink", "black", "white", "gray"] # Get the embedding for each term term_embeddings = {term: get_embedding(term) for term in terms} # Find the term with the closest distance and same direction to the target distance and direction closest_term = None closest_distance = float('inf') start_term = "red" start_embedding = get_embedding(start_term) for term, embedding in term_embeddings.items(): if term == start_term: continue distance, direction = cosine_distance_and_direction(start_embedding, embedding) if direction == target_direction and abs(distance - target_distance) < closest_distance: closest_distance = abs(distance - target_distance) closest_term = term closest_term, closest_distance ('purple', np.float64(0.006596347059928065)) Purple
  • 35. Why not 'orange'? The code's result of ('purple', np.float64(0.006596347059928065)) suggests that, in the embedding space used by the model, "red" and "purple" have a closer semantic relationship than "red" and "orange". The embedding model used in the code has determined that "red" and "purple" are closer semantically. This is likely due to the specific contexts and relationships captured by the model during training. It yields 'purple' instead of orange because the cosine distance and direction calculations between the embeddings of "red" and other terms result in "purple" being the closest match to the target distance and direction from "green" to "blue". # Find a term that has the same distance and direction blue has from green, but starting from blue target_distance = distance_green_blue target_direction = direction_green_blue # Define a list of terms to compare terms = ["red", "orange", "yellow", "green", "blue", "purple", "pink", "black", "white", "gray"] # Get the embedding for each term term_embeddings = {term: get_embedding(term) for term in terms} # Find the term with the closest distance and same direction to the target distance and direction closest_term = None closest_distance = float('inf') start_term = "red" start_embedding = get_embedding(start_term) for term, embedding in term_embeddings.items(): if term == start_term: continue distance, direction = cosine_distance_and_direction(start_embedding, embedding) if direction == target_direction and abs(distance - target_distance) < closest_distance: closest_distance = abs(distance - target_distance) closest_term = term closest_term, closest_distance
  • 37. What are embeddings? Embeddings are numerical representations that capture the essential features and relationships of discrete objects, like words or documents, in a continuous vector space.
  • 38. The most important thing to understand Embeddings are numerical representations of data that: capture semantic meaning and allow for efficient comparison of similarity.
  • 39. Key points about embeddings 1. They can represent various data types, not just text. 2. Dimensionality 3. Context sensitivity affects interpretation and application.
  • 40. Applications of embeddings include: - Semantic search - Question-answering applications - Image search - Audio search - Recommender systems - Anomaly detection “Generate your own embeddings” (Inference API)
  • 43. © 2024 Pinecone – All rights reserved 45 1. Questions? #hallwaytrack 2. Recording? YouTube! 3. Slides? Ask me
  • 46. © 2024 Pinecone – All rights reserved 48