SlideShare a Scribd company logo
10
Most read
14
Most read
17
Most read
Vector Similarity Search & Indexing Methods
Xiaomeng Yi
Senior Researcher, Zilliz
© 2020 Zilliz. All rights reserved.
Vector Similarity Search
© 2020 Zilliz. All rights reserved.
Information Retrieval: from text to versatile data types
How to measure similarity between data?
© 2020 Zilliz. All rights reserved.
Embeddings: represent data as vectors
a b c
a
b
c
© 2020 Zilliz. All rights reserved.
Efficiency problem for big data
• Trade accuracy for efficiency
• Indexing method
© 2020 Zilliz. All rights reserved.
Indexing Methods
© 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
© 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
© 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
© 2020 Zilliz. All rights reserved.
Graph: optimizations
Efficient and robust approximate nearest neighbor search
using Hierarchical Navigable Small world graphs
Approximate nearest neighbor algorithm based on
navigable small world graphs
© 2020 Zilliz. All rights reserved.
Example:
Space Partition based index
Approximate nearest neighbor
methods and vector models The inverted Multi-Index
© 2020 Zilliz. All rights reserved.
Optimization for space partition
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
© 2020 Zilliz. All rights reserved.
Encoding based index: general idea
Product quantization for nearest neighbor search
© 2020 Zilliz. All rights reserved.
Encoding: product quantization
Similarity Query Processing for High-Dimensional Data
© 2020 Zilliz. All rights reserved.
Comparison
Fast, accurate, and small,
never reached at the same time…
Fast
Accurate Small
HNSW L&C
IVF_PQ
IVF
_SQ
FLAT ∅
© 2020 Zilliz. All rights reserved.
Flexible indexes: A layered framework
© 2020 Zilliz. All rights reserved.
Layers: function decomposition
Layer
Function
Data Size Candidates
for a query
Requireme
nt
Space
Partition
Regions Small Full Accurate,
Fast
Candidate
Filtering
Compress
ed vectors
Mediu
m
Small
portion
Fast
Result
Validation
Original
vectors
Large Very small
portion
Accurate
© 2020 Zilliz. All rights reserved.
Layer
Function
Size Require
ment
Index Type
(Adjustable)
Optimization
Opportunity
Space
Partition
Small Accurate,
fast
Graph Cache-based
optimization
Candidate
Filtering
Medi
um
Small Coarse
encoding
Data locality,
inter/intra query
parallelism
Result
Validation
Large Accurate Flat SSD-based Storage,
compute-read pipeline
Layers: optimization opportunity
© 2020 Zilliz. All rights reserved.
Thank you!

More Related Content

What's hot (20)

PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
PPTX
Data analytics
davidfergarcia
 
PPTX
Vector space model in information retrieval
Tharuka Vishwajith Sarathchandra
 
PPTX
Metadata ppt
Shashikant Kumar
 
PDF
And then there were ... Large Language Models
Leon Dohmen
 
PDF
Machine Learning for Dummies
Venkata Reddy Konasani
 
PDF
Latent Dirichlet Allocation
Marco Righini
 
PDF
Introduction of Knowledge Graphs
Jeff Z. Pan
 
PPTX
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
PPT
A Hybrid Recommendation system
Pranav Prakash
 
PDF
Recommender system algorithm and architecture
Liang Xiang
 
PPTX
Big data
factscomputersoftware
 
PDF
Music recommendations @ MLConf 2014
Erik Bernhardsson
 
PPTX
Information retrieval 7 boolean model
Vaibhav Khanna
 
PPTX
Supervised and unsupervised learning
Paras Kohli
 
PPTX
Data analytics and visualization
Vini Vasundharan
 
PDF
Building intelligent applications with Large Language Models
Speck&Tech
 
PPTX
Semi-Supervised Learning
Lukas Tencer
 
PDF
Big Data Ppt PowerPoint Presentation Slides
SlideTeam
 
PPTX
Relationship Between Big Data & AI
Maruf Abdullah (Rion)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Data analytics
davidfergarcia
 
Vector space model in information retrieval
Tharuka Vishwajith Sarathchandra
 
Metadata ppt
Shashikant Kumar
 
And then there were ... Large Language Models
Leon Dohmen
 
Machine Learning for Dummies
Venkata Reddy Konasani
 
Latent Dirichlet Allocation
Marco Righini
 
Introduction of Knowledge Graphs
Jeff Z. Pan
 
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
A Hybrid Recommendation system
Pranav Prakash
 
Recommender system algorithm and architecture
Liang Xiang
 
Music recommendations @ MLConf 2014
Erik Bernhardsson
 
Information retrieval 7 boolean model
Vaibhav Khanna
 
Supervised and unsupervised learning
Paras Kohli
 
Data analytics and visualization
Vini Vasundharan
 
Building intelligent applications with Large Language Models
Speck&Tech
 
Semi-Supervised Learning
Lukas Tencer
 
Big Data Ppt PowerPoint Presentation Slides
SlideTeam
 
Relationship Between Big Data & AI
Maruf Abdullah (Rion)
 

Similar to Vector Similarity Search & Indexing Methods (20)

PDF
Beyond Retrieval Augmented Generation (RAG): Vector Databases
Zilliz
 
PDF
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Zilliz
 
PDF
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
 
PDF
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
PPT
Survey on Frequent Pattern Mining on Graph Data - Slides
Kasun Gajasinghe
 
PDF
Introduction to Open Source RAG and RAG Evaluation
Zilliz
 
PDF
Searching in metric spaces
unyil96
 
PDF
K-SUBSPACES QUANTIZATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCH
Nexgen Technology
 
PPT
Trends In Graph Data Management And Mining
Srinath Srinivasa
 
PDF
Erik Bernhardsson, CTO, Better Mortgage
MLconf
 
PDF
Introduction to Vector search - Argmx talk
Zilliz
 
PPT
Lect12 graph mining
Houw Liong The
 
PPTX
Nearest neighbors
zekeLabs Technologies
 
PDF
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
Zilliz
 
PDF
large_scale_search.pdf
Emerald72
 
PDF
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
Timothy Spann
 
PDF
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Yury Lifshits
 
PDF
2025-02-24 - AWS meetup - Zilliz presentation.pdf
Ivan Tang
 
PDF
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
TigerGraph
 
PPT
graph_mining_seminar_2009.ppt
Venkateswara Rao Katevarapu
 
Beyond Retrieval Augmented Generation (RAG): Vector Databases
Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Zilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Survey on Frequent Pattern Mining on Graph Data - Slides
Kasun Gajasinghe
 
Introduction to Open Source RAG and RAG Evaluation
Zilliz
 
Searching in metric spaces
unyil96
 
K-SUBSPACES QUANTIZATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCH
Nexgen Technology
 
Trends In Graph Data Management And Mining
Srinath Srinivasa
 
Erik Bernhardsson, CTO, Better Mortgage
MLconf
 
Introduction to Vector search - Argmx talk
Zilliz
 
Lect12 graph mining
Houw Liong The
 
Nearest neighbors
zekeLabs Technologies
 
Supercharge Spark: Unleashing Big Data Potential with Milvus for RAG systems
Zilliz
 
large_scale_search.pdf
Emerald72
 
09-03-2024_UnstructuredDataAndAIDiscussion.pdf
Timothy Spann
 
Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small Worl...
Yury Lifshits
 
2025-02-24 - AWS meetup - Zilliz presentation.pdf
Ivan Tang
 
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
TigerGraph
 
graph_mining_seminar_2009.ppt
Venkateswara Rao Katevarapu
 
Ad

Recently uploaded (20)

PDF
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PDF
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
Ad

Vector Similarity Search & Indexing Methods

  • 1. Vector Similarity Search & Indexing Methods Xiaomeng Yi Senior Researcher, Zilliz
  • 2. © 2020 Zilliz. All rights reserved. Vector Similarity Search
  • 3. © 2020 Zilliz. All rights reserved. Information Retrieval: from text to versatile data types How to measure similarity between data?
  • 4. © 2020 Zilliz. All rights reserved. Embeddings: represent data as vectors a b c a b c
  • 5. © 2020 Zilliz. All rights reserved. Efficiency problem for big data • Trade accuracy for efficiency • Indexing method
  • 6. © 2020 Zilliz. All rights reserved. Indexing Methods
  • 7. © 2020 Zilliz. All rights reserved. Graph based Index: general idea Approximate nearest neighbor algorithm based on navigable small world graphs
  • 8. © 2020 Zilliz. All rights reserved. Graph based Index: general idea Approximate nearest neighbor algorithm based on navigable small world graphs
  • 9. © 2020 Zilliz. All rights reserved. Graph based Index: general idea Approximate nearest neighbor algorithm based on navigable small world graphs
  • 10. © 2020 Zilliz. All rights reserved. Graph: optimizations Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small world graphs Approximate nearest neighbor algorithm based on navigable small world graphs
  • 11. © 2020 Zilliz. All rights reserved. Example: Space Partition based index Approximate nearest neighbor methods and vector models The inverted Multi-Index
  • 12. © 2020 Zilliz. All rights reserved. Optimization for space partition Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
  • 13. © 2020 Zilliz. All rights reserved. Encoding based index: general idea Product quantization for nearest neighbor search
  • 14. © 2020 Zilliz. All rights reserved. Encoding: product quantization Similarity Query Processing for High-Dimensional Data
  • 15. © 2020 Zilliz. All rights reserved. Comparison Fast, accurate, and small, never reached at the same time… Fast Accurate Small HNSW L&C IVF_PQ IVF _SQ FLAT ∅
  • 16. © 2020 Zilliz. All rights reserved. Flexible indexes: A layered framework
  • 17. © 2020 Zilliz. All rights reserved. Layers: function decomposition Layer Function Data Size Candidates for a query Requireme nt Space Partition Regions Small Full Accurate, Fast Candidate Filtering Compress ed vectors Mediu m Small portion Fast Result Validation Original vectors Large Very small portion Accurate
  • 18. © 2020 Zilliz. All rights reserved. Layer Function Size Require ment Index Type (Adjustable) Optimization Opportunity Space Partition Small Accurate, fast Graph Cache-based optimization Candidate Filtering Medi um Small Coarse encoding Data locality, inter/intra query parallelism Result Validation Large Accurate Flat SSD-based Storage, compute-read pipeline Layers: optimization opportunity
  • 19. © 2020 Zilliz. All rights reserved. Thank you!