Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exact word marked as a near miss #1002

Open
ciesiolka opened this issue Mar 20, 2024 · 1 comment
Open

Exact word marked as a near miss #1002

ciesiolka opened this issue Mar 20, 2024 · 1 comment

Comments

@ciesiolka
Copy link

ciesiolka commented Mar 20, 2024

It may be not a bug, but rather my mistake/misunderstanding of how hunspell dictionaries work.

I am trying to create a dictionary for latin language with accents. Let's consider a word románus. According to its declension one of its form is romanórum. To represent that I created the following dic and aff files:

1
románus/A
SET UTF-8

SFX A N 1
SFX A us órum

This doesn't work because those rules generate word románórum which is invalid since it has two accents. So what I did is that I added an OCONV entry:

(...)

OCONV 1
OCONV ánó anó

Running hunspell with that dictionary gives an odd result: románórum is accepted, but romanórum is considered a near miss with suggested spelling romanórum (exactly the same).

Hunspell 1.7.0
románórum
+ románus

romanórum
& romanórum 1 0: romanórum

Maybe I simply misunderstood how ICONV and OCONV work - the explanation in man isn't very detailed.

@juozhe
Copy link

juozhe commented May 28, 2024

I'd suggest changing -ánus into -anórum with aff file

SET UTF-8
SFX A N 1
SFX A ánus anórum

Depending on stress patterns, you'll probably end up with different flags, one for each stress pattern

echo "romanórum" | hunspell -d la
+ románus

echo "románus" | hunspell -d la
*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants