Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicts garbage for Bengali input #119

Open
hafiz031 opened this issue Jan 6, 2022 · 2 comments
Open

Predicts garbage for Bengali input #119

hafiz031 opened this issue Jan 6, 2022 · 2 comments

Comments

@hafiz031
Copy link

hafiz031 commented Jan 6, 2022

I am trying this lookup_compound | Keep original casing example on a Bengali corpus of unigrams and bigrams. As a separator I have used comma. But it seems to be not working. For any misspelled input it is just outputting garbage string. This issue happened on this python implementation of this package.

@wolfgarbe
Copy link
Owner

wolfgarbe commented Jan 6, 2022

In order to look into the issue I would need the following information:

  1. all SymSpell parameters used: prefixLength, maxEditDistanceDictionary , maxEditDistanceLookup , suggestionVerbosity
  2. Bengali unigrams and bigram frequency dictionary
  3. some Bengali examples: input text, current output text, expected output text

@hafiz031
Copy link
Author

hafiz031 commented Jan 6, 2022

@wolfgarbe I mistakenly posted this issue here. Actually this issue was found in one of the Python implementations of this package. I re-posted the issue there later. Here is the link: mammothb/symspellpy#110. Here you will also find the unigram and the bigram frequency dictionaries from my comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants