Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing form_of keys for some senses #514

Open
Vuizur opened this issue Feb 23, 2024 · 7 comments
Open

Missing form_of keys for some senses #514

Vuizur opened this issue Feb 23, 2024 · 7 comments

Comments

@Vuizur
Copy link
Contributor

Vuizur commented Feb 23, 2024

A few senses are missing the form_of key.
I made a heuristic check and searched for glosses containing the word "singular", but where there was no form_of or alt_of key in the sense:
The results are as follows:

lang_code count
la 99302
it 39869
fr 13659
ru 13499
es 9778
pl 6413
de 6282
nl 3905
hu 3756
gl 3307
hi 2878
sh 2501

A few are false positives, but for languages like Italian and French it is a somewhat widespread occurance.
Example words:
Italian: ami, tuba, impala, copula, replica, musica, pesca.
French: cube, abuse, azure, update, love
Latin: multi, maximum, visa, gemini
Russian: используешь, удачи, малого, змея

Thanks a lot for all the recent commits!

@kristian-clausal
Copy link
Collaborator

Verb

ami

    inflection of amare:
        second-person singular present indicative
        first/second/third-person singular present subjunctive
        third-person singular imperative

It's because of this formatting. form-of requires that it's accompanied by certain keywords and an of, so "second-person singular present indicative of amare" qualifies, but "inflection of amare:... second-person singular present indicative" doesn't. I'll take a look at it next week more in-depth, unless it turns out to be trivial.

@kristian-clausal
Copy link
Collaborator

kristian-clausal commented Feb 23, 2024

I've committed a kludge (inserting it into a list of kludges) that specifically handles "inflection of" followed by a sublist of entries like these. If you could check the data again later to see how much is improved, it would be appreciated.

@Vuizur
Copy link
Contributor Author

Vuizur commented Feb 23, 2024

I'll do it with the next released dump. 👍 Thanks a lot for the work!

@herschelrs
Copy link

I've found quite a few of these, I'm specifically working with the Spanish dataset. These seem to happen with a handful of specific conjugations, and with some verbs with less common morphology (specifically reflexive verbs whose lemmas are recorded under the infinitive + reflexive pronoun, rather than as a sense under the infinitive generically).

I found 18534 entries with 'args': {'1': 'es', '2': 'verb form'} in a head_template but no sense with a form_of key.

@kristian-clausal
Copy link
Collaborator

Kaikki has finally updated, and it seems like form-of is present now in at least 'impala' (Italian) and 'azure' (French).

This doesn't mean everything is now fixed, of course, but it will help to find the next issue.

@kristian-clausal
Copy link
Collaborator

What does the situation look like, currently?

@Vuizur
Copy link
Contributor Author

Vuizur commented Apr 30, 2024

The external hard drive I ran the calculations on seems to be dying, I will have to find another way to repeat it 😅.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants