Skip to content

Add new similarity field to knn clause in _search #94828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

benwtrent
Copy link
Member

This adds a new parameter to knn that allows filtering nearest neighbor results that are outside a given similarity.

num_candidates and k are still required as this controls the nearest-neighbor vector search accuracy and exploration. For each shard the query will search num_candidates and only keep those that are within the provided similarity boundary, and then finally reduce to only the global top k as normal.

For example, when using the l2_norm indexed similarity value, this could be considered a radius post-filter on knn.

relates to: #84929 && #93574

@benwtrent benwtrent added >feature release highlight :Search/Search Search-related issues that do not fall into other categories v8.8.0 labels Mar 28, 2023
@benwtrent benwtrent requested a review from javanna March 28, 2023 12:58
@github-actions
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Mar 28, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you. Note that since this PR is labelled release highlight, you need to update the changelog YAML to fill out the extended information sections.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!!


import static org.elasticsearch.common.Strings.format;

public class VectorSimilarityQuery extends Query {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider contributing this query back to Lucene?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, there was some pushback that even such a thing was necessary. The implementation is so simple.

I can open a PR to see what noise is made :)

@benwtrent benwtrent merged commit f23b906 into elastic:main Mar 28, 2023
@benwtrent benwtrent deleted the feature/add-similarity-threshold-to-knn branch March 28, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature release highlight :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants