Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare letters in toki pona [BUG] #4432

Open
gtbot2007 opened this issue Apr 11, 2024 · 4 comments
Open

Rare letters in toki pona [BUG] #4432

gtbot2007 opened this issue Apr 11, 2024 · 4 comments
Labels

Comments

@gtbot2007
Copy link

You are unable to use uncommon letters in toki pona sentences. Due to this, words such as "yupekosi", "Pingo" and "kalamARR" (among a few others) can not be submitted.

@gtbot2007 gtbot2007 added the Bug label Apr 11, 2024
@jessicarose
Copy link
Collaborator

Hi @gtbot2007 and thanks so much for raising this issue, could I get a bit more detail in what's preventing uncommon letters from being used? Is this a limitation you've found in the guidelines that could be better clarified or are you getting a bug or error message when trying to input sentences that contain words like "yupekosi", "Pingo" and "kalamARR"?

@HarikalarKutusu
Copy link
Contributor

I believe at least the last one happens because of the default validation rules defined below, which tries to handle abbreviations:

For other cases you need to create language specific validation rules, like the ones here:
https://github.com/common-voice/common-voice/tree/main/server/src/core/sentences/validation/languages

@gtbot2007
Copy link
Author

Since they are used so few words and the words are also ”non-standard” they are considered foreign letters. Which is technically true but maybe there should be a case to allow the words “Pingo”, “yupekosi”, “kalamARR”, “yutu” and maybe even “yu” and “y”.

@HarikalarKutusu
Copy link
Contributor

Sorry for the noise, I posted before checking. There IS actually a tok validation file here:
https://github.com/common-voice/common-voice/blob/main/server/src/core/sentences/validation/languages/tok.ts

It seems, each one of the samples you gave is hitting one regex rule there. Either they are not wanted, or the regex rules might need tweaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants