Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added option to set grammar with custom lexicon #1362

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mmende
Copy link
Contributor

@mmende mmende commented May 19, 2023

This PR adds a new API method vosk_recognizer_set_grm_with_lexicon which allows providing a custom pronunciation lexicon in addition to a grammar. The recognizer uses this lexicon to recreate the HCLr transducer at runtime which allows recognizing words that were not in the lexicon before.

To be able to recreate the HCLr transducer, the model must be a lookahead model and include the context dependency (tree) file and phone symbol table (phones.txt). In some rough, unscientific tests with vosk-model-small-de-0.15, the HCLr recreation for 10 words took ~15ms, 100 words took ~70ms, 500 words took ~430ms, 1000 words took about 1500ms.

There are some hardcoded variables such as the silence phone label (SIL), silence probability , self loop scale and transition scale and grammar fst's are not yet supported.

Furthermore does the method require that the epsilon entry (<eps>) is also in the given lexicon and that phones with positional information must be used correctly (if the model uses such).

PS: Sorry for the whole reformatting stuff (that must have been the Clang-Format extension).

@Shallowmallow
Copy link

Lovely idea, would love to use it :)
But I'm unable to compile from the lexicon branch. Is there something missing ?

g++ -g -O3 -std=c++17 -Wno-deprecated-declarations -fPIC -DFST_NO_DYNAMIC_LINKING -I. -I/opt/kaldi/src -I/opt/kaldi/tools/openfst/include  -I/opt/kaldi/tools/OpenBLAS/install/include -c -o recognizer.o recognizer.cc
In file included from recognizer.h:33,
                 from recognizer.cc:15:
model.h:98:3: error: ‘ContextDependency’ does not name a type
   98 |   ContextDependency *ctx_dep_ = nullptr;
      |   ^~~~~~~~~~~~~~~~~
recognizer.cc: In member function ‘void Recognizer::RebuildLexicon(std::vector<std::__cxx11::basic_string<char> >&, std::vector<std::__cxx11::basic_string<char> >&)’:
recognizer.cc:970:16: error: ‘class Model’ has no member named ‘phone_syms_loaded_’; did you mean ‘word_syms_loaded_’?
  970 |   if (!model_->phone_syms_loaded_ || model_->ctx_dep_ == nullptr) {
      |                ^~~~~~~~~~~~~~~~~~
      |                word_syms_loaded_
recognizer.cc:970:46: error: ‘class Model’ has no member named ‘ctx_dep_’
  970 |   if (!model_->phone_syms_loaded_ || model_->ctx_dep_ == nullptr) {
      |                                              ^~~~~~~~
recognizer.cc:1091:33: error: ‘class Model’ has no member named ‘ctx_dep_’
 1091 |   int32 context_width = model_->ctx_dep_->ContextWidth();
      |                                 ^~~~~~~~
recognizer.cc:1092:36: error: ‘class Model’ has no member named ‘ctx_dep_’
 1092 |   int32 central_position = model_->ctx_dep_->CentralPosition();
      |                                    ^~~~~~~~
recognizer.cc:1102:3: error: ‘HTransducerConfig’ was not declared in this scope
 1102 |   HTransducerConfig h_cfg;
      |   ^~~~~~~~~~~~~~~~~
recognizer.cc:1103:3: error: ‘h_cfg’ was not declared in this scope
 1103 |   h_cfg.transition_scale = transition_scale;
      |   ^~~~~
recognizer.cc:1109:40: error: ‘class Model’ has no member named ‘ctx_dep_’
 1109 |       GetHTransducer(ilabels, *model_->ctx_dep_, *model_->trans_model_, h_cfg,
      |                                        ^~~~~~~~
recognizer.cc:1109:7: error: ‘GetHTransducer’ was not declared in this scope
 1109 |       GetHTransducer(ilabels, *model_->ctx_dep_, *model_->trans_model_, h_cfg,
      |       ^~~~~~~~~~~~~~
recognizer.cc:1131:59: error: no matching function for call to ‘AddSelfLoops(kaldi::TransitionModel&, std::vector<int>&, float&, bool&, bool&, fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >*)’
 1131 |                reorder, check_no_self_loops, &composed_fst);
      |                                                           ^
In file included from /opt/kaldi/src/fstext/pre-determinize.h:94,
                 from /opt/kaldi/src/fstext/fstext-utils-inl.h:29,
                 from /opt/kaldi/src/fstext/fstext-utils.h:425,
                 from /opt/kaldi/src/fstext/deterministic-fst-inl.h:25,
                 from /opt/kaldi/src/fstext/deterministic-fst.h:333,
                 from /opt/kaldi/src/fstext/grammar-context-fst.h:51,
                 from /opt/kaldi/src/decoder/grammar-fst.h:36,
                 from /opt/kaldi/src/decoder/lattice-faster-decoder.h:26,
                 from recognizer.h:21,
                 from recognizer.cc:15:
/opt/kaldi/src/fstext/pre-determinize-inl.h:599:26: note: candidate: ‘template<class Arc> void fst::AddSelfLoops(fst::MutableFst<Arc>*, std::vector<typename Arc::Label>&, std::vector<typename Arc::Label>&)’
  599 | template<class Arc> void AddSelfLoops(MutableFst<Arc> *fst, std::vector<typename Arc::Label> &isyms,
      |                          ^~~~~~~~~~~~
/opt/kaldi/src/fstext/pre-determinize-inl.h:599:26: note:   template argument deduction/substitution failed:
recognizer.cc:1131:59: note:   mismatched types ‘fst::MutableFst<Arc>*’ and ‘kaldi::TransitionModel’
 1131 |                reorder, check_no_self_loops, &composed_fst);
      |                                                           ^
make: *** [Makefile:112 : recognizer.o] Erreur 1


@mmende
Copy link
Contributor Author

mmende commented Aug 3, 2023

I'll try to fix it in the next days.

@mmende
Copy link
Contributor Author

mmende commented Aug 3, 2023

@Shallowmallow you should be able to compile it now. Let me know if it worked or not...

@Shallowmallow
Copy link

Indeed, it compiles. Thanks @mmende !

@LexiconCode
Copy link

LexiconCode commented May 3, 2024

If there was a function to activate and deactivate multiple grammars this could allow for context/application specific grammars.

Use case would be multiple programs each with their own grammars. Client-side the foreground window changes the appropriate grammar would be activated in the not relevant loaded grammars would be disabled but loaded in the back-end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants