Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures with quotes #228

Open
phajdan opened this issue Dec 20, 2014 · 9 comments · May be fixed by #613
Open

Failures with quotes #228

phajdan opened this issue Dec 20, 2014 · 9 comments · May be fixed by #613

Comments

@phajdan
Copy link
Contributor

phajdan commented Dec 20, 2014

Running hunspell in Emacs, or directly with "hunspell -a": single quote is taken as part of
the word, which leads to tons of bogus spelling suggestions. I'd expect something like this
to be known, but a few searches got me nowhere. This is using the version that comes with
Fedora 20: 1.3.3.

Original comment by: elibarzilay

Original Ticket: hunspell/bugs/259

@tbm
Copy link

tbm commented Apr 6, 2018

I see the same issue with 1.4.1.

@tbm
Copy link

tbm commented Apr 6, 2018

Still there in 1.6.2. Simple test case:

The next stable release of Debian is `buster'

hunspell will identify buster' (with the quote)

@tbm
Copy link

tbm commented Apr 6, 2018

This sounds related: en-wl/wordlist#122

@tbm
Copy link

tbm commented Apr 6, 2018

Related to #504

@lmmarsano
Copy link

lmmarsano commented Dec 7, 2018

Same issue nearly 4 years later.
According to the ChangeLog

hunspell/ChangeLog

Lines 89 to 95 in 4ddd8ed

* better apostrophe usage:
- WORDCHARS only with one of the Unicode or ASCII apostrophe
results extended word tokenization: both of them will be part of
the words (if they are inside: eg. word's, but not words').
- convert Unicode apostrophes to ASCII ones for 8-bit dictionaries
(eg. English dictionaries), or for UTF-8 dictionaries only
with ASCII apostrophe supports (eg. French dictionaries).

tokenization should treat interior apostrophes as part of words and exclude boundary apostrophes.
However, the test provided in lmmarsano/hunspell@c825888 fails the assertion: please checkout to see.

luism@lmm-notebook:~/project/hunspell/tests$ ./test.sh apostrophe.dic
=============================================
Fail in apostrophe.good. Good words recognised as wrong:
'is'

I wish I knew enough to PR a fix.

@lmmarsano lmmarsano linked a pull request Dec 8, 2018 that will close this issue
@loretoparisi
Copy link

loretoparisi commented Mar 14, 2019

This seems to happens for Italian as well:

In Della morte dell'amore, from the tokenizer dell will be considered as wrong with suggestions ["del","della","dello","delle","del l"], where the output for dell' (note the straight apostrophe) is

{ index: 0,
  word: 'dell\'',
  stems: [],
  suggestion: [ 'della', 'dello', 'delle' ],
  correct: false }

@mcepl
Copy link

mcepl commented Aug 27, 2019

Is this the root of this problem:

~$ echo "And no, spellchecking doesn’t work well in vim, because exactly this sentence is marked as misspelled." | hunspell -d en_US --check-apostrophe -l
doesn
~$

Hmm, with en_GB it seems to work, so I guess it is dictionary dependent.

@astoff
Copy link

astoff commented Mar 5, 2023

This issue is still present in Hunspell 1.7.0, and includes the en_GB dictionary:

$ echo "He asked, 'Why can't I quote?'" | hunspell -d en_GB
Hunspell 1.7.0
*
*
& 'Why 1 10: why
*
*
*
& ' 15 29: e, s, i, a, n, r, t, o, l, c, d, u, g, m, f

@Atemu
Copy link

Atemu commented Apr 7, 2024

This is an issue with the dictionaries. The English hunspell dicts from https://sourceforge.net/projects/wordlist/files/speller/ contain this line in their aff files:

WORDCHARS 0123456789

but this must include the ' character like this:

WORDCHARS 0123456789'

in order to detect contractions like "doesn't" as a single word.

If I manually modify change the dict accordingly, it works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants