Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break patterns for the beginning/end of the word should not be applied in the middle #91

Open
dimztimz opened this issue Oct 16, 2020 · 0 comments

Comments

@dimztimz
Copy link
Contributor

dimztimz commented Oct 16, 2020

Break patters at the beginning and the end of the word shouldn't be considered in the recursive calls. IMO, they should work on the whole word and not on part of it. For example, given the following break patterns

BREAK 2
BREAK ^+
BREAK -

and the word abc-+xyz, the algorithm will first check the whole word, then it will split it to abc and +xyz, and then +xyz will be trimmed to xyz because of the break pattern BREAK ^+. This is considered a break pattern only for the beginning of the word, and should remain as such. This behavior also exists in Hunspell, and I think this is by accident and it should be fixed. I need to retest and reanalyze Hunspell code for break patterns, but I'm pretty sure it works this way and it is by accident.

Originally posted by @dimztimz in #85 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant