-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multithreading copies in lib vec #108802
Fix multithreading copies in lib vec #108802
Conversation
Pinging @elastic/es-search (Team:Search) |
Hi @ChrisHegarty, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine run elasticsearch-ci/bwc-snapshots |
@elasticmachine update branch |
@elasticmachine update branch |
@elasticmachine update branch |
💔 Backport failed
You can use sqren/backport to manually backport by running |
This commit fixes a potential multithreading issue with the lib vec vector scorer. Since the implementation falls back to a lucene scorer which needs to read from the index input, then we need to make a copy of the index input. Otherwise, there is a potential for the stateful index input to be accessed across threads - which would be bad. The fallback is only used when one or other vector cross a segment boundary, which is 16G by default. So the likelihood of this occurring in practice is small, but the affect is bad. The fix is deliberately small and targeted, so that it can be backported. After this change, I'm going to drop the custom VectorScorer and adapter type, in favour of using the Lucene type directly. This custom types were initially used when the code lived inside the native module, where we didn't want to add a dependency on Lucene directly.
This commit fixes a potential multithreading issue with the lib vec vector scorer.
Since the implementation falls back to a lucene scorer which needs to read from the index input, then we need to make a copy of the index input. Otherwise, there is a potential for the stateful index input to be accessed across threads - which would be bad.
The fallback is only used when one or other vector cross a segment boundary, which is 16G by default. So the likelihood of this occurring in practice is small, but the affect is bad.
The fix is deliberately small and targeted, so that it can be backported. After this change, I'm going to drop the custom VectorScorer and adapter type, in favour of using the Lucene type directly. This custom types were initially used when the code lived inside the native module, where we didn't want to add a dependency on Lucene directly.