Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REP: support suggesting a lowercase replacement for an all-caps input #924

Open
Gallaecio opened this issue Apr 6, 2023 · 7 comments
Open

Comments

@Gallaecio
Copy link

Imagine a language where “bar” is a good word, and ”FOO“ is a common misspelling:

dic:

1
bar

aff

REP 1
REP ^FOO$ bar

At the moment, this suggestion does not work as I would expect. “bar” is not suggested for “FOO”. Nothing is suggested for “FOO”, in fact.

The best workaround I could find was using “foo” in the REP rule, but then the suggestion you get for “FOO” is “BAR”, not “bar”.

In an actual language, this may be relatively common when an acronym needs to be replaced with a regular word (e.g. SOS → help signal). I found this when trying to suggest “identificador único universal” as a replacement for “UUID” in Galician.

@shantanuo
Copy link
Contributor

You can write your dic file as:

1
bar ph:FOO

If you type FOO then bar will be suggested on right click.

@Gallaecio
Copy link
Author

That is good to know, but that would not work for multi-word replacements, i.e. you cannot do:

1
identificador único universal ph:UUID

@shantanuo
Copy link
Contributor

shantanuo commented Jan 9, 2024

Replace space with underscore.

REP 2	
REP SOS help_signal
REP UUID identificador_único_universal

Make sure that all 5 words are part of dictionary. (help signal identificador único universal)

@Gallaecio
Copy link
Author

That’s what I tried and did not work (there was no suggestion for UUID), although I did use ^ and $ as seen in my original report.

@shantanuo
Copy link
Contributor

I guess REP tag does not support ^ and $
Did you try without that as shown in my example?

@Gallaecio
Copy link
Author

Gallaecio commented Jan 22, 2024

Removing ^ and $ does not make a difference, it does not work as I would hope:

$ cat test.aff 
SET UTF-8
REP 1
REP uuid identificador_único_universal
$ cat test.dic 
3
identificador
único
universal
$ echo UUID uuid Uuid | hunspell -d test
Hunspell 1.7.2
& UUID 1 0: IDENTIFICADOR ÚNICO UNIVERSAL
& uuid 1 5: identificador único universal
& Uuid 1 10: Identificador único universal

I want UUID to suggest identificador único universal, not IDENTIFICADOR ÚNICO UNIVERSAL.

If I use UUID instead of uuid in REP, I get no suggestions at all.

@shantanuo
Copy link
Contributor

Understood. It took me some time :)
This is not a bug but certainly a good feature request. There should be a flag to disable capital letter adjustments in REPlacement tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants