Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multithreading copies in lib vec #108802

Merged
merged 10 commits into from
May 19, 2024

Conversation

ChrisHegarty
Copy link
Contributor

This commit fixes a potential multithreading issue with the lib vec vector scorer.

Since the implementation falls back to a lucene scorer which needs to read from the index input, then we need to make a copy of the index input. Otherwise, there is a potential for the stateful index input to be accessed across threads - which would be bad.

The fallback is only used when one or other vector cross a segment boundary, which is 16G by default. So the likelihood of this occurring in practice is small, but the affect is bad.

The fix is deliberately small and targeted, so that it can be backported. After this change, I'm going to drop the custom VectorScorer and adapter type, in favour of using the Lucene type directly. This custom types were initially used when the code lived inside the native module, where we didn't want to add a dependency on Lucene directly.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Hi @ChrisHegarty, I've created a changelog YAML for you.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@ChrisHegarty ChrisHegarty added auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport-and-merge Automatically create backport pull requests and merge when ready labels May 18, 2024
@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc-snapshots

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor Author

@elasticmachine update branch

@elasticsearchmachine elasticsearchmachine merged commit a7e4423 into elastic:main May 19, 2024
15 checks passed
@ChrisHegarty ChrisHegarty deleted the vecscorer_copy branch May 19, 2024 14:23
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.14 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 108802

ChrisHegarty added a commit to ChrisHegarty/elasticsearch that referenced this pull request May 19, 2024
This commit fixes a potential multithreading issue with the lib vec
vector scorer.

Since the implementation falls back to a lucene scorer which needs to
read from the index input, then we need to make a copy of the index
input. Otherwise, there is a potential for the stateful index input to
be accessed across threads - which would be bad.

The fallback is only used when one or other vector cross a segment
boundary, which is 16G by default. So the likelihood of this occurring
in practice is small, but the affect is bad.

The fix is deliberately small and targeted, so that it can be
backported. After this change, I'm going to drop the custom VectorScorer
and adapter type, in favour of using the Lucene type directly. This
custom types were initially used when the code lived inside the native
module, where we didn't want to add a dependency on Lucene directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending >bug :Search/Vectors Vector search Team:Search Meta label for search team v8.14.1 v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants