Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question re: adding extra dictionaries #26

Open
akotranza opened this issue Jul 22, 2020 · 2 comments
Open

Question re: adding extra dictionaries #26

akotranza opened this issue Jul 22, 2020 · 2 comments

Comments

@akotranza
Copy link

akotranza commented Jul 22, 2020

node 12.10.x
nspell 2.1.2

I'm not sure if this is a bug or my misinterpretation of the documentation.

Adding extra dictionaries via

new nspell([{aff: mainaff_buff, dic: maindic_buff}, {dic: extradic_buff}]), or
nspell_instance.dictionary(<extradic_buff>), or
nspell_instance.dictionary(['some','new',words'].join('\n'));

appears to cause all inputs to be considered correct by .correct and .suggest

I do see that the nspell_instance object contains all of the words from both maindic and extradic as well as all of the affix information, but it does not seem to be used 😕

However, using .personal to add extra words does work as expected and testing with both maindic and the extra words produces correct .correct and .suggest outputs.

I'm not sure if I am missing something in the documentation or if this is an issue. I am using the dictionary-en package to load the main aff & dic, and the extra dictionaries are plain word lists in utf-8 loaded into a buffer.

Steps to reproduce:

A. baseline, single dictionary

const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)

nspell.correct('ultrasonogram') // => false; OK b/c not in dictionary
nspell.correct('ultrasongram') // => false; OK
nspell.correct('feleing') // => false; 👍 

nspell.suggest('ultrasonogram') // => [ ]; OK
nspell.suggest('ultrasongram') // => [ ]; OK
nspell.suggest('feleing') // => ['feeling', 'fleeing', ...]; 👍 

B. add words via .dictionary (same behavior if it's another buffer passed in to constructor)

const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)
nspell.dictionary(['ultrasonogram','ultrasonosurgery'].join('\n'))

nspell.correct('ultrasonogram') // => true
nspell.correct('ultrasongram') // => true; 👎 
nspell.correct('feleing') // => true; 👎 

nspell.suggest('ultrasonogram') // => []
nspell.suggest('ultrasongram') // => []
nspell.suggest('feleing') // => []

C. add words via .personal

const maindic = require('dictionary-en')
const nspell = new NSpell(maindic)
nspell.dictionary(['ultrasonogram','ultrasonosurgery'].join('\n'))

nspell.correct('ultrasonogram') // => true
nspell.correct('ultrasongram') // => false
nspell.correct('feleing') // => false

nspell.suggest('ultrasonogram') // => []
nspell.suggest('ultrasongram') // => ['ultrasonogram'] 👍 
nspell.suggest('feleing') // => ['feeling', 'fleeing', ...] 👍 
@wooorm
Copy link
Owner

wooorm commented Jul 22, 2020

First: dictionaries must start with a number of how many items they’ll contain. And also: they’re funky and have to be made to work with the other affix file, so it’s probably better to work with .personal.

But yup, I can reproduce this bug.

With this code:

const NSpell = require('nspell')
const en = require('dictionary-en')

en(function (err, maindic) {
  if (err) throw err
  const nspell = new NSpell(maindic)
  nspell.dictionary(['2', 'ultrasonogram','ultrasonosurgery'].join('\n'))

  console.log(nspell.correct('ultrasonogram')) // => true
  console.log(nspell.correct('ultrasongram')) // => true; 👎 
  console.log(nspell.correct('feleing')) // => true; 👎 

  console.log(nspell.suggest('ultrasonogram')) // => []
  console.log(nspell.suggest('ultrasongram')) // => []
  console.log(nspell.suggest('feleing')) // => []
})

...and console.log('source:', [rule, source]) right before here, I get:

source: [
  'n*1t',
  '(?:0|1|2|3|4|5|6|7|8|9)*(?:1)(?:0th|1th|2th|3th|4th|5th|6th|7th|8th|9th)'
]
source: [
  'n*mp',
  '(?:0|1|2|3|4|5|6|7|8|9)*(?:0|2|3|4|5|6|7|8|9)(?:0th|1st|2nd|3rd|4th|5th|6th|7th|8th|9th)'
]
source: [
  /(?:0|1|2|3|4|5|6|7|8|9)*(?:1)(?:0th|1th|2th|3th|4th|5th|6th|7th|8th|9th)/i,
  ''
]
source: [
  /(?:0|1|2|3|4|5|6|7|8|9)*(?:0|2|3|4|5|6|7|8|9)(?:0th|1st|2nd|3rd|4th|5th|6th|7th|8th|9th)/i,
  ''
]
true
true
true
[]
[]
[]

The problem is that the regexes fail to work, leading to an empty regex (/(?:)/), resulting in any word marked as valid 🤔

@akotranza
Copy link
Author

Thanks for the quick reply!

The application I'm concerned with is pulling in a ~10,000 term hunspell medical term dictionary (which does have the count as the first line, though I completely forgot about it in the repro example).

It appears to be working as expected and performing well loading the whole dictionary with .personal (after editing the dic to remove term count and the gpl license notice 😬) so I'll continue using that approach.

I'd be interested to investigate more though my knowledge of how the affix stuff works is zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants