On the fly contextual adaptation, #4571

KarelVesely84 · 2021-06-18T15:19:33Z

adding code for doing 'lattice boosting' and 'HCLG boosting' to do on-the fly contextual adaptation by WFST composition

this PR contains the code to do the composition, each utterance can be adapted with different FST
this PR does not contain the code to prepare the WFST 'adaptation graphs',
for HCLG boosting the graph should be simple (single state graph), for more complicated boosting graphs the HCLG o B composition is too slow...

egs/wsj/s5/steps/nnet3/decode_compose.sh

src/nnet3bin/nnet3-latgen-faster-compose.cc

kkm000 · 2021-06-18T19:16:00Z

egs/wsj/s5/steps/nnet3/decode_compose.sh

+## Set up features.
+if [ -f $srcdir/online_cmvn ]; then online_cmvn=true
+else online_cmvn=false; fi
+
+if ! $online_cmvn; then


online_cmvn is used only once.

Suggested change

## Set up features.

if [ -f $srcdir/online_cmvn ]; then online_cmvn=true

else online_cmvn=false; fi

if ! $online_cmvn; then

## Set up features.

if [[ ! -f $srcdir/online_cmvn ]]; then

i prefer the script not to deviate too much from steps/nnet3/decode.sh, people might be willing to diff against it.
but, yes it could be shorter and the original idea was that '$online_cmvn' might be used in more than one place...

kkm000 · 2021-06-18T19:22:25Z

This is a super interesting observation, thanks @vesis84!

        //       OpenFst docs say that more specific iterators
        //       are faster than generic iterators. And in HCLG
        //       is usually loaded for decoding as ConstFst.
        //
        //       auto decode_fst_ = ConstFst<StdArc>(decode_fst);
        //
        //       In this way, I tried to cast VectorFst to ConstFst,
        //       but this made the decoding 20% slower.

Very related to #4564. While we're trying to squeeze 5% out of it...

LvHang · 2021-06-19T05:09:00Z

Hm....This is interesting.

IIRC, the "ConstFst" is actually faster to read ,write and access than "VectorFast" as it has less fragmentation, so the "ReadFstKaldiGeneric" related functions are added and cast the "VectorFst" HCLG to "ConstFst" one when the graph is built.
I checked my old emails and found the following paragraph which is forwarded by Dan, someone said
"We find that const FSTs load from storage significantly faster than vector FSTs. They also take up less space in memory. Const FSTs also speed up decoding by a measurable amount, as they seem to traverse more quickly. Curiously, we see that const FSTs take up approximately 10% more space on disk when compared to vector FSTs."

For the specific iterators are faster than generic iterators, I remembered that's why the codes as follows are added

if (fst_->Type() == "const") {
  LatticeFasterDecoderTpl<fst::ConstFst<fst::StdArc>, Token> *this_cast =
    reinterpret_cast<LatticeFasterDecoderTpl<fst::ConstFst<fst::StdArc>, Token>* >(this);
  this_cast->AdvanceDecoding(decodable, max_num_frames);
....

kkm000 · 2021-06-19T06:41:51Z

@LvHang, AFAICR, ConstFst is not transformed in any way when loaded; it can be directly mapped with mmap(2) into the address space. This means it has offsets there the VectorFst has pointers. Offset arithmetics may impose certain overhead, although the 20% difference is hard to explain by it. Possibly, a loss of cache locality is at play here. A good profiler (Intel VTune is freely available now), when best of all running on real hardware (even if a VM has a vPMU, I've personally seen a lot of glitches with these; without PMU, this statistics is unavailable) will collect reliable cache miss statistics.

Also, it's important to do the comparison at the optimization turned to 11: A recent compiler, like GCC 10.2 or Clang 11 or 12, -O3 is mandatory (especially with GCC, as -O2 does not imply loop unrolling, unlike clang), and all architecture advantages taken (e.g., if you run it on Skylake-X Xeon with AVX512, specify -march=skylake-avx512; the default -msse2 does not cut it at all).

galv · 2021-06-19T16:41:33Z

src/latbin/lattice-compose-fsts.cc

+        fst::ILabelCompare<StdArc> ilabel_comp;
+        ArcSort(fst2, ilabel_comp);
+      }
+      /* // THIS MAKES ALL STATES FINAL STATES! WHY?


Does the ρ-matcher treats ε specially? I vaguely remember it doesn't. If this is so, and it falls under the "everything else" category, I can guess why it might happen.

@galv What is wrong with that ? Do you react on the present ArcSort or the disabled PropagateFinal ?

He may have been referring to the comment
// THIS MAKES ALL STATES FINAL STATES! WHY?
...
It is only relevant if the --phi-label option was specified (0 is not allowed for phi label),
in which case it will use backoff semantics.
I think it makes sense to keep that commented-out code since the phi stuff is supported.
It is intended that all states in G.fst should be final, because EOS is allowed from all states.

.. PropagateFinal() fixes the semantics of EOS, because it is done as a final-prob, not an arc, so the phi-matcher does not naturally handle it. (In k2 this kind of problem could not happen because final-probs are done as a special transition to the superfinal-state, so we wouldn't need special-purpose code).

Hi @vesis84 the main concern is that you are trying to commit code with a commented out block that you apparently don't understand, from a software engineering point of view.

Also, I thought it was odd that you say that PropagateFinal makes all states final, but fst2 is expected to be a single-state FST based on what you wrote. In this case, I would expect you to set the only state in the FST to be final anyway. Unfortunately, I don't know how phi-symbol matching works, though, so I can't really say more about what your specific problem may be.

In FST composition with ρ-matcher, when there are the ε symbols the output of the composition should be the same regardless if ε is removed or not. On the other hand, for me the runtime of composition was faster, when i removed ε from the boosting graph B in the HCLG o B composition.

It might be that there are many ε in HCLG output symbols, and it is better
to remove ε from B so that only one of the composition arguments has ε.

galv · 2021-06-19T16:44:10Z

src/nnet3bin/nnet3-latgen-faster-compose.cc

+        timer_compose.Reset();
+
+        // RmEpsilon saved 30% of composition runtime...
+        // - Note: we are loading 2-state graphs with eps back-link to the initial state.


I thought that these are one state graphs, with a bunch of self loops on that single state.

This is another reason why I would want to see a recipe to create an appropriating boost fst.

well, it could be 1 state, and i made it as 2 state with <eps> link to state 1, so that the figures look better.
semantically, it is identical...

galv · 2021-06-19T16:47:32Z

this PR does not contain the code to prepare the WFST 'adaptation graphs',

Is it a good idea to merge this PR as is without an example WFST that this is known to work with? Think about how there was an example recipe with the lookahead decoding recipe PR.

@LvHang @kkm000 I suspect something much, much simpler is going on. FST Composition creates a VectorFst as output, and that VectorFst is used only once (to decode a single utterance). Why convert to ConstFst if you're only going to run the Fst just once? Keep in mind that decoding with FSTs typically reads <10% of the FST states, so the overhead of conversion (with touches 100% of states and 100% of arcs) is not worth it here.

kkm000 · 2021-06-21T06:20:31Z

@galv, you're almost certainly right.

@vesis84, only the profiler knows the ultimate truth. When I profile a piece of sluggish code for the first time, I much more often get surprises 🎉🎉 than not. Like, InfoZip's unzip spends 90% of decompression time in libz's crc32() function. Who's expect that!

KarelVesely84 · 2021-06-21T15:37:38Z

Hi, thank you for the feedback. I adopted some of the suggestions.

Some other suggestion are pointing into the 'old' code that was taken, copied and modified.
I don't think we should deviate from the style of that original code too much,
it would be good if it stays possible to diff against the original code.

Other thing is that, sometimes the code seems to be 'over-explicit',
and this is for a good reason of simple readability. It might annoy
some people, but also help other people. So it is difficult to decide
about the shortenings. I try to stick with the style of other code
as much as possible.

I'll process your inputs and re-open later.
Thanks
Karel

danpovey · 2021-06-22T11:29:17Z

src/latbin/lattice-compose-fsts.cc

+// latbin/lattice-compose-fsts.cc
+
+// Copyright 2020  Brno University of Technology; Microsoft Corporation
+


Was this file actually intended to be added? It seess to be the same as lattice-compose, with some relatively small changes, and essentially the same interface; I don't understand why it is a new program instead of a modification to the old one.

yes, they are similar, the original lattice-compose does compose 'N lattices' with 'N lattices', or 'N lattices' with '1 FST'...

yes, it could be told externally that there are rxpecifier has 'N lattices' by default or 'N FSTs' with some boolean option like '--read-fsts-from-rxspec', but that would be lead to a more complicated interface, which goes against the principle of 'copy and modify' to keep things clear that we were using in the past...

and yes i'll do the change, if it is needed

danpovey · 2021-06-22T11:34:21Z

src/latbin/lattice-compose-fsts.cc

+        "The FST weights are interpreted as \"graph weights\" when converted into the Lattice format.\n"
+        "\n"
+        "Usage: lattice-compose-fsts [options] lattice-rspecifier1 "
+        "(fst-rspecifier2|fst-rxfilename2) lattice-wspecifier\n"


I am thinking this can be done with a string arg called e.g. "--compose-with-fst", defaulting to "auto" which is the old behavior, meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False is FST or lattice respectively.

I'd like to cosult about this PropagateFinal(phi_label, fst2).
Is it needed there ?

I did not want its behavior that all the states in the FST graph were becoming final states
(2nd arg of FST composition) and the contextual adaptation was not working well due to that...

It was maybe helpful when composing lattices with lattices,
to make them better 'match' and avoid empty outputs?

Could that be also disabled by an CLI option (--propagate-final=false) ?

I'd rather modify the existing program if it can be done while clearly not affecting existing usages, just to avoid bloat.
The final propagation is needed to have expected behavior when the FST represents a backoff n-gram language model, because if you don't have that, the proper semantics of backing off to lower order states doesn't happen when it comes to final-probs. I'm not sure what your topology is.. perhaps you can describe it?

kkm000 · 2021-06-22T18:08:59Z

@vesis84

sometimes the code seems to be 'over-explicit', and this is for a good reason of simple readability. It might annoy some people, but also help other people.

Readability = good. I have not seen anyone annoyed because he had too little trouble reading code :) I've been annoyed by unreadable code myself. Unless you are referring to the comment about a stack object destroyed at end of scope, which is really excessive, I'd say. One needs to know basic C++ to read code.

I don't think we should deviate from the style of that original code too much

I think that we should do what we say we do and follow the coding style, even though we have old code which doesn't, and fix the old code when we touch it. Or stop saying that we have the standard. Being of two minds in this respect is the worst.

One's logical order of includes is another's illogical order. Alphabetical order is unambiguous. Imagine I edit something unrelated in this file and also casually rearrange includes because I think my order is more logical. You then rearrange them again. Last thing I want to see is the battle of commits. Unless you have any idea how to unambiguously standardize the "logical order," it's better to stick tight to written guidelines.

One day I'll unleash clang-format on the codebase, and it will ultimately squish all style arguments. Which is a good thing, as I want to spend the first day of the rest of my life kaldying, not arguing about our coding style. :)

danpovey · 2021-06-23T06:32:18Z

that ClassifierRspecifier() call has no side effects, at least if last 2 args are null, so that should be OK.

…

On Wed, Jun 23, 2021 at 3:04 AM kkm000 ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/nnet3bin/nnet3-latgen-faster-compose.cc <#4571 (comment)>: > + if (word_syms_filename != "") { + word_syms.reset(fst::SymbolTable::ReadText(word_syms_filename)); + if (!word_syms) + KALDI_ERR << "Could not read symbol table from file " + << word_syms_filename; + } + + double tot_like = 0.0; + kaldi::int64 frame_count = 0; + int num_success = 0, num_fail = 0; + // this compiler object allows caching of computations across + // different utterances. + CachingOptimizingCompiler compiler(am_nnet.GetNnet(), + decodable_opts.optimize_config); + + KALDI_ASSERT(ClassifyRspecifier(hclg_fst_rxfilename, NULL, NULL) == kNoRspecifier); Oh. This is a bug. Don't put any code inside the ASSERT. It compiles to nothing if NDEBUG is defined. https://github.com/kaldi-asr/kaldi/blob/9d235864c3105c3b72feb9f19a219dbae08b3a41/src/base/kaldi-error.h#L184-L194 — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#4571 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO3IHP2XVO43JZTANVTTUDNCLANCNFSM4653Y6WA> .

KarelVesely84 · 2021-06-23T13:56:57Z

Okay, i give up for the moment. And Dan's comment about extending lattice-compose is right.
I'll work on it later in a separate branch. And working contextual adaptation on lattice-level will be a good start.

One more thing, when i was going through Google papers, i noticed they are using sigma-matcher for contextual adaptation.
The boosting graph contains just the text snippets to be boosted and a sigma arc to match everything else.

I am not sure yet, how difficult would it be to add support of sigma-composition to kaldi for composing
a lattice with FST having a sigma-arc. But the 'wirings' might be analogical to the phi-composition that
already is there... So, maybe one day... Let's see...

daanzu · 2021-06-24T14:24:19Z

@vesis84 I am quite interested in this feature, and appreciate you making the PR. Regardless of whether it gets merged right now, I would be curious to see an full example including the "adaptation graph" FST. Even seeing a pre-prepared FST would be enlightening, if you don't want to release the code to generate them.

stale · 2021-08-23T20:18:26Z

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

- This is a follow-up of kaldi-asr#4571 - Refactoring 'lattice-compose.cc' to support composition with ark of fsts, so that it is done as Dan suggested before: I am thinking this can be done with a string arg called e.g. "--compose-with-fst", defaulting to "auto" which is the old behavior, meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False is FST or lattice respectively. - I added there possibility of rho-composition, which is useful for biasing lattices with word-sequences. Thanks to rho-composition, the biasing graph does not need to contain all words from lexicon. - Would you be interested in an example how to use this? (i.e. create graphs from text file with python script using openfst as library, but that would need to change build of openfst to enable python extensions) - Also which 'egs' recipe would be convenient to use it with?

danpovey · 2022-01-30T02:47:43Z

I'm going to see if I can find someone to address the issues in this PR. I think it is an important topic that we should pursue.

… support RhoMatcher (#4692) * Extending 'lattice-compose.cc' to compose with ark of fsts, - This is a follow-up of #4571 - Refactoring 'lattice-compose.cc' to support composition with ark of fsts, so that it is done as Dan suggested before: I am thinking this can be done with a string arg called e.g. "--compose-with-fst", defaulting to "auto" which is the old behavior, meaning: rspecifier=lats, rxfilename=FST; and true/True or false/False is FST or lattice respectively. - I added there possibility of rho-composition, which is useful for biasing lattices with word-sequences. Thanks to rho-composition, the biasing graph does not need to contain all words from lexicon. - Would you be interested in an example how to use this? (i.e. create graphs from text file with python script using openfst as library, but that would need to change build of openfst to enable python extensions) - Also which 'egs' recipe would be convenient to use it with? * lattice-compose.cc, resolving remarks from PR #4692 * fixing issue in std::transform with std::tolower, suggesting variant of overloaded function * lattice-compose, extending the rho explanation

…osting graph

Adding suggestions from code review. Co-authored-by: Cy 'kkm' Katsnelson <kkm@pobox.com>

…rho.sh script

kkm000 · 2022-05-09T14:41:26Z

Ahoj, @vesis84, why? This is a very useful feature. Was it merged in another PR?

KarelVesely84 · 2022-05-10T15:06:02Z

Hi,
i thought it was not suitable for integration and stale PR.
Also, part of this PR was reworked and integrated into #4692 (the lattice boosting part).

The remainder is the HCLG boosting part. It worked for me in the paper with noisy input data:
http://www.fit.vutbr.cz/research/groups/speech/publi/2021/kocour21_interspeech.pdf
For HCLG boosting, only isolated words could be boosted. For more
complicated graphs, the composition would be very slow.

I was doing the composition here:
https://github.com/vesis84/kaldi/blob/atco2/src/nnet3bin/nnet3-latgen-faster-compose.cc#L239
For this, the HCLG graph is internally converted from ConstFst to VectorFst,
so it becomes mutable.

The HCLG boosting can also be dangerous. If the boosting values are too big,
the recognition output becomes very bad easily. Lattice boosting usually
produces decent results even with high boosting values.

Are you sure, you want to continue with the HCLG boosting ?
Or what should be the direction to continue ?
I can work on that... Let me know what is the intention...
(i can remove the lattice boosting part from the PR, create some examples...)

K.

jtrmal · 2022-05-10T15:09:58Z

I'll try to respond this week if @danpovey or @kkm won't have more specific ideas (sooner :) y.

…

On Tue, May 10, 2022 at 11:06 AM Karel Vesely ***@***.***> wrote: Reopened #4571 <#4571>. — Reply to this email directly, view it on GitHub <#4571 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUKYXZ22GTCJQ5AXDDJZL3VJJ3QBANCNFSM4653Y6WA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

kkm000 · 2022-10-11T09:09:29Z

I think that boosting individual words would be a great thing already! We used Kaldi decoder to automate customer service for multiple clients, including clothing manufacturers and road assistance. The road assistance application sometimes spit out stuff like "I need to change attire." A simple lexicon boosting would have resolved that. More significant HCLG changes on the fly would be likely expensive.

jtrmal · 2022-10-12T18:57:57Z

yeah, I agree

…

On Tue, Oct 11, 2022 at 5:09 AM kkm000 ***@***.***> wrote: I think that boosting individual words would be a great thing already! We used Kaldi decoder to automate customer service for multiple clients, including clothing manufacturers and road assistance. The road assistance application sometimes spit out stuff like "I need to change attire." A simple lexicon boosting would have resolved that. More significant HCLG changes on the fly would be likely expensive. — Reply to this email directly, view it on GitHub <#4571 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUKYX3KKI2RCRQJ7Y4UE2TWCUVFJANCNFSM4653Y6WA> . You are receiving this because you commented.Message ID: ***@***.***>

jtrmal · 2022-10-12T19:32:07Z

@vesis84 do you have the bandwidth to work on this? it would be a welcome feature

KarelVesely84 requested a review from danpovey June 18, 2021 15:19

kkm000 requested changes Jun 18, 2021

View reviewed changes

galv reviewed Jun 19, 2021

View reviewed changes

danpovey reviewed Jun 22, 2021

View reviewed changes

stale bot added the stale Stale bot on the loose label Aug 23, 2021

kkm000 added stale-exclude Stale bot ignore this issue stopped development On backburner. Please comment if you think it should not be. and removed stale Stale bot on the loose labels Sep 22, 2021

KarelVesely84 mentioned this pull request Jan 25, 2022

Extending 'lattice-compose.cc' to compose with ark of fsts, #4692

Merged

KarelVesely84 and others added 5 commits January 31, 2022 15:49

Tool for lattice rescoring by composing with per-utterance FSTs.

f2b6724

nnet3-latgen-faster-compose, online composition of HCLG graph with bo…

ecb0ce7

…osting graph

adding script for on-the-fly boosting of HCLG graph

d87ec23

updating the HCLG boosting code, debugging

f9ae938

Apply suggestions from code review

0e31ffa

Adding suggestions from code review. Co-authored-by: Cy 'kkm' Katsnelson <kkm@pobox.com>

KarelVesely84 force-pushed the atco2 branch from 60fa225 to 0e31ffa Compare January 31, 2022 14:51

nnet3-latgen-faster-compose, add RhoCompose and nnet3/decode_compose_…

634a715

…rho.sh script

KarelVesely84 closed this May 9, 2022

KarelVesely84 reopened this May 10, 2022

		// latbin/lattice-compose-fsts.cc

		// Copyright 2020 Brno University of Technology; Microsoft Corporation

On the fly contextual adaptation, #4571

Are you sure you want to change the base?

On the fly contextual adaptation, #4571

Conversation

KarelVesely84 commented Jun 18, 2021

kkm000 Jun 18, 2021 • edited

Choose a reason for hiding this comment

KarelVesely84 Jun 21, 2021 • edited

Choose a reason for hiding this comment

kkm000 commented Jun 18, 2021

LvHang commented Jun 19, 2021

kkm000 commented Jun 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KarelVesely84 Jun 22, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KarelVesely84 Jun 21, 2021 • edited

Choose a reason for hiding this comment

galv commented Jun 19, 2021

kkm000 commented Jun 21, 2021

KarelVesely84 commented Jun 21, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkm000 commented Jun 22, 2021

danpovey commented Jun 23, 2021 via email

KarelVesely84 commented Jun 23, 2021

daanzu commented Jun 24, 2021

stale bot commented Aug 23, 2021

danpovey commented Jan 30, 2022

kkm000 commented May 9, 2022

KarelVesely84 commented May 10, 2022 • edited

jtrmal commented May 10, 2022 via email

kkm000 commented Oct 11, 2022 via email

jtrmal commented Oct 12, 2022 via email

jtrmal commented Oct 12, 2022

kkm000 Jun 18, 2021 •

edited

KarelVesely84 Jun 21, 2021 •

edited

KarelVesely84 Jun 22, 2021 •

edited

KarelVesely84 Jun 21, 2021 •

edited

KarelVesely84 commented Jun 21, 2021 •

edited

KarelVesely84 commented May 10, 2022 •

edited