Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Split dictionary and core #62

Open
jirisykora83 opened this issue Mar 25, 2019 · 1 comment
Open

Suggestion: Split dictionary and core #62

jirisykora83 opened this issue Mar 25, 2019 · 1 comment

Comments

@jirisykora83
Copy link

I try SymSpell and it looks great. But one thinks I notice almost intermediately it bring dictionary file in my project (even if I do not use it). I understand it helps with a quick start but I strongly believe in real application most of the users make own. But even if not. I think it will be better to have NuGet package split to SymSpell.Core and SymSpell.Dic.En for example. For keeping compatibility SymSpell could be composed of these two packages (something like Microsoft.AspNetCore.App.

@KOLANICH
Copy link

I have created an abstraction layer for several libraries doing word splitting for Python. All of the libs receive kind of word list. Some ship the wordlist with them, some (like this ones) don't. I think that having an own word list for each library is redundancy and a violation of DRY principle (it may be convenient for the devs not to depend on compatibility though, but it is still harmful).

I think we need a common spec for the dictionary files, because quite some software use them. So these wordlists could be installed and updated separately. So I won't have to bother "this lib requires a wordlist", "this lib bundles an own one".

I also wonder if there are any benchmark datasets for such a task. I mean not only speed benchmark, but one of quality of splitting. And maybe even classifying the errors the implementations cannot avoid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants