Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word "C++" is tokenized incorrectly and can not be whitelisted #272

Open
ravenexp opened this issue Aug 17, 2022 · 3 comments
Open

Word "C++" is tokenized incorrectly and can not be whitelisted #272

ravenexp opened this issue Aug 17, 2022 · 3 comments
Assignees
Labels
bug Something isn't working checker / hunspell hunspell checker related topics tokenization

Comments

@ravenexp
Copy link

Describe the bug

It is not possible to whitelist the word "C++" by adding it to the local Hunspell dictionary.

Adding "^[cC][+][+]$" to the transform_regex list also does not help.

To Reproduce

Steps to reproduce the behaviour:

  1. A file containing the word "C++"
  2. Add "C++" into the local Hunspell dictionary.
  3. Run cargo spellcheck ....
  4. A spelling error message is displayed for every "+" in "C++".

Expected behavior

Hunspell finds "C++" in the local dictionary and accepts it as correct.

Screenshots

error: spellcheck(Hunspell)
    --> /home/x/y.md:252
     |
 252 | Specifically, the GNU C++ compiler version 8.2 or newer and
     |                        ^
     |   Possible spelling mistake found.
error: spellcheck(Hunspell)
    --> /home/x/y.md:252
     |
 252 | Specifically, the GNU C++ compiler version 8.2 or newer and
     |                         ^
     |   Possible spelling mistake found.

Please complete the following information:

  • System: Arch Linux
  • Obtained: pacman
  • Version: cargo-spellcheck 0.11.2
@ravenexp ravenexp added the bug Something isn't working label Aug 17, 2022
@ravenexp
Copy link
Author

Oh, I've accidentally found a workaround while figuring out how to make cargo-spellcheck not complain about "—" (EM-DASH).

Adding

transform_regex = [..., "^[+]$"]

to the config makes cargo-spellcheck accept "C++" as a correct word.

@drahnr
Copy link
Owner

drahnr commented Aug 17, 2022

A workaround is to .. yes, exactly this - allow + tokens. Tokenization is done by a third party lib and will never be perfect. Either use ``` or add the workaround you found.

If you would like to make spellcheck aware of additional splitchars, there is tokenization_splitchars in [Hunspell].

@drahnr drahnr added checker / hunspell hunspell checker related topics tokenization labels Aug 17, 2022
@ravenexp
Copy link
Author

If you would like to make spellcheck aware of additional splitchars, there is tokenization_splitchars in [Hunspell].

Thanks, that's even better!

BTW, it's not mentioned in

https://github.com/drahnr/cargo-spellcheck/blob/master/docs/configuration.md

and I had to run cargo spellcheck config --stdout to find out about this parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working checker / hunspell hunspell checker related topics tokenization
Projects
None yet
Development

No branches or pull requests

2 participants