-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add new similarity
field to knn
clause in _search
#94828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new similarity
field to knn
clause in _search
#94828
Conversation
Documentation preview: |
Pinging @elastic/es-search (Team:Search) |
Hi @benwtrent, I've created a changelog YAML for you. Note that since this PR is labelled |
…benwtrent/elasticsearch into feature/add-similarity-threshold-to-knn
…ity-threshold-to-knn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
|
||
import static org.elasticsearch.common.Strings.format; | ||
|
||
public class VectorSimilarityQuery extends Query { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider contributing this query back to Lucene?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, there was some pushback that even such a thing was necessary. The implementation is so simple.
I can open a PR to see what noise is made :)
This adds a new parameter to
knn
that allows filtering nearest neighbor results that are outside a given similarity.num_candidates
andk
are still required as this controls the nearest-neighbor vector search accuracy and exploration. For each shard the query will searchnum_candidates
and only keep those that are within the providedsimilarity
boundary, and then finally reduce to only the global topk
as normal.For example, when using the
l2_norm
indexed similarity value, this could be considered aradius
post-filter onknn
.relates to: #84929 && #93574