Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this possible? Scan dic file and obtain all forms of all files #1

Open
MonsterMMORPG opened this issue Mar 2, 2017 · 6 comments

Comments

@MonsterMMORPG
Copy link

What i want is simple

I would like to obtain all words that can be composed from the given word

E.g.

make/UAGS

in us.dic file

So i want to obtain all words that can be obtained from this word/suffix combination

e.g. results are : made, making, makes etc

@aarondandy
Copy link
Owner

This is possible, but really hard! The easiest thing is what I think you want though, to just slap some affixes onto some roots. This is not totally correct but it should get you somewhere. There are loads of corner cases that I don't understand and this example does not even touch on compound words. Hope this helps: https://gist.github.com/aarondandy/aaa622afeeb0cb86b0d4efe697c23be5

@MonsterMMORPG
Copy link
Author

MonsterMMORPG commented Mar 7, 2017 via email

@rianjs
Copy link

rianjs commented May 27, 2017

I worked around this by using Hunspell's unmunch command which will generate all forms of all words. This is probably quicker/easier for a one-off job. (And it enables really fast comparisons against a HashSet<string>--at least an order of magnitude faster than Hunspell itself.)

@MonsterMMORPG
Copy link
Author

MonsterMMORPG commented May 27, 2017 via email

@rianjs
Copy link

rianjs commented May 28, 2017

unmunch doesnt work for UTF8

unmunch may not work for non-ASCII characters, or non-Latin characters, or unusual character encodings, but it absolutely works on UTF-8 files. You may want to read up on Unicode and character encodings.

@MonsterMMORPG
Copy link
Author

MonsterMMORPG commented May 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants