Plda Scoring #548

sadrasabouri · 2021-05-22T18:43:43Z

Initially issued by @gooran and is mainly inspired by similar task in lid_kaldi.

By merging this pull request bellow changes will be happened:

ADDED:

KaldiRecognizer::PldaScoring added to kaldi_recognizer.cc
plda added to spk_model.h
plda_config added to spk_model.h
plda_rxfilename added to spk_model.h
vad_opts added to spk_model.h
num_utts added to spk_model.h
train_ivectors added to spk_model.h
train_ivector_rspecifier added to spk_model.h
num_utts_rspecifier added to spk_model.h
sorted_scores method added to test_speaker.py
spk_sig vector changed to be similar dim with model (dim=128)

After this PR for each utterance after xvector extraction there will be a PLDA scoring which scores the likelihood between the uterance speaker (test) and train xvectors and this process will be done automatically after each utterance and the JSON result will contain a new field called scores which should be work like this:

[
...
"scores" :
    [
    {'speaker': 'spk0', 'score': -3.518532},
    {'speaker': 'spk1', 'score': 8.106313},
    ...
    ]
]

There is also a tiny edit on spk field returning xvector of latest utterance. After this PR this field will be filled by PldaScoring method during its computation for PLDA.

I've tested this feature with bellow model files and all the things seems normal:

model = vosk-model-small-fa-0.5.zip
model-spk = vosk-model-spk-0.5.zip - which is a new sre16 based speaker recognition model containing bellow files gathered together by @gooran using bellow recipe:

#!/usr/bin/env bash
# Copyright      2017   David Snyder
#                2017   Johns Hopkins University (Author: Daniel Garcia-Romero)
#                2017   Johns Hopkins University (Author: Daniel Povey)
# Apache 2.0.
#
# See README.txt for more info on data required.
# Results (mostly EERs) are inline in comments below.
#
# This example demonstrates a "bare bones" NIST SRE 2016 recipe using xvectors.
# It is closely based on "X-vectors: Robust DNN Embeddings for Speaker
# Recognition" by Snyder et al.  In the future, we will add score-normalization
# and a more effective form of PLDA domain adaptation.
#
# Pretrained models are available for this recipe.  See
# http://kaldi-asr.org/models.html and
# https://david-ryan-snyder.github.io/2017/10/04/model_sre16_v2.html
# for details.

. ./cmd.sh
. ./path.sh
set -e
mfccdir=`pwd`/mfcc
vaddir=`pwd`/mfcc


sre16_trials=data/sre16_eval_test/trials
nnet_dir=exp/xvector_nnet_1a

stage=9

if [ $stage -le 1 ]; then
  # Make MFCCs and compute the energy-based VAD for each dataset
  for name in sre16_major sre16_eval_test sre16_eval_enroll; do
    steps/make_mfcc.sh --write-utt2num-frames true --mfcc-config conf/mfcc.conf --nj 20 --cmd "$train_cmd" \
      data/${name} exp/make_mfcc $mfccdir
    utils/fix_data_dir.sh data/${name}
    sid/compute_vad_decision.sh --nj 20 --cmd "$train_cmd" \
      data/${name} exp/make_vad $vaddir
    utils/fix_data_dir.sh data/${name}
  done
fi


if [ $stage -le 7 ]; then
  # The SRE16 major is an unlabeled dataset consisting of Cantonese and
  # and Tagalog.  This is useful for things like centering, whitening and
  # score normalization.
  sid/nnet3/xvector/extract_xvectors.sh --cmd "$train_cmd --mem 6G" --nj 20 \
    $nnet_dir data/sre16_major \
    exp/xvectors_sre16_major

  # The SRE16 test data
  sid/nnet3/xvector/extract_xvectors.sh --cmd "$train_cmd --mem 6G" --nj 20 \
    $nnet_dir data/sre16_eval_test \
    exp/xvectors_sre16_eval_test

  # The SRE16 enroll data
  sid/nnet3/xvector/extract_xvectors.sh --cmd "$train_cmd --mem 6G" --nj 20 \
    $nnet_dir data/sre16_eval_enroll \
    exp/xvectors_sre16_eval_enroll
fi

if [ $stage -le 9 ]; then
  # Get results using the out-of-domain PLDA model.
  $train_cmd exp/scores/log/sre16_eval_scoring.log \
    ivector-plda-scoring --normalize-length=true \
    --num-utts=ark:exp/xvectors_sre16_eval_enroll/num_utts.ark \
    "ivector-copy-plda --smoothing=0.0 exp/xvectors_sre_combined/plda - |" \
    "ark:ivector-mean ark:data/sre16_eval_enroll/spk2utt scp:exp/xvectors_sre16_eval_enroll/xvector.scp ark:- | ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec ark:- ark:- | transform-vec exp/xvectors_sre_combined/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" \
    "ark:ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec scp:exp/xvectors_sre16_eval_test/xvector.scp ark:- | transform-vec exp/xvectors_sre_combined/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" \
    "cat '$sre16_trials' | cut -d\  --fields=1,2 |" exp/scores/sre16_eval_scores || exit 1;

  pooled_eer=$(paste $sre16_trials exp/scores/sre16_eval_scores | awk '{print $6, $3}' | compute-eer - 2>/dev/null)

  echo "Using Out-of-Domain PLDA, EER: Pooled ${pooled_eer}%"
  # EER: Pooled 11.73%, Tagalog 15.96%, Cantonese 7.52%
  # For reference, here's the ivector system from ../v1:
  # EER: Pooled 13.65%, Tagalog 17.73%, Cantonese 9.61%
fi

if [ $stage -le 10 ]; then
$train_cmd copy_plda.log ivector-copy-plda --smoothing=0.0 exp/xvectors_sre16_major/plda_adapt exp/xvectors_sre16_major/plda_adapt.smooth0.1
  $train_cmd log.1.log ivector-mean ark:data/sre16_eval_enroll/spk2utt scp:exp/xvectors_sre16_eval_enroll/xvector.scp ark:exp/xvectors_sre16_eval_test/xvector.55.scp ark:exp/xvectors_sre16_eval_enroll/num_utts_.ark
  $train_cmd log.2.log ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec ark:exp/xvectors_sre16_eval_test/xvector.55.scp ark:exp/xvectors_sre16_eval_test/xvector.66.scp
  $train_cmd log.3.log transform-vec exp/xvectors_sre_combined/transform.mat ark:exp/xvectors_sre16_eval_test/xvector.66.scp ark:exp/xvectors_sre16_eval_test/xvector.77.scp
  $train_cmd log.4.log ivector-normalize-length ark:exp/xvectors_sre16_eval_test/xvector.77.scp  ark:exp/xvectors_sre16_eval_test/xvector.final.train.scp
  # Get results using the adapted PLDA model.
  $train_cmd exp/scores/log/sre16_eval_scoring_adapt.log \
    ivector-plda-scoring --normalize-length=true \
    --num-utts=ark:exp/xvectors_sre16_eval_enroll/num_utts.ark \
    "ivector-copy-plda --smoothing=0.0 exp/xvectors_sre16_major/plda_adapt - |" \
    "ark:ivector-mean ark:data/sre16_eval_enroll/spk2utt scp:exp/xvectors_sre16_eval_enroll/xvector.scp ark:- | ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec ark:- ark:- | transform-vec exp/xvectors_sre_combined/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" \
    "ark:ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec scp:exp/xvectors_sre16_eval_test/xvector.scp ark:- | transform-vec exp/xvectors_sre_combined/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" \
    "cat '$sre16_trials' | cut -d\  --fields=1,2 |" exp/scores/sre16_eval_scores_adapt || exit 1;

  pooled_eer=$(paste $sre16_trials exp/scores/sre16_eval_scores_adapt | awk '{print $6, $3}' | compute-eer - 2>/dev/null)
  echo "Using Adapted PLDA, EER: Pooled ${pooled_eer}%"
  # EER: Pooled 8.57%, Tagalog 12.29%, Cantonese 4.89%
  # For reference, here's the ivector system from ../v1:
  # EER: Pooled 12.98%, Tagalog 17.8%, Cantonese 8.35%
  #
  # Using the official SRE16 scoring software, we obtain the following equalized results:
  #
  # -- Pooled --
  #  EER:          8.66
  #  min_Cprimary: 0.61
  #  act_Cprimary: 0.62
  #
  # -- Cantonese --
  # EER:           4.69
  # min_Cprimary:  0.42
  # act_Cprimary:  0.43
  #
  # -- Tagalog --
  # EER:          12.63
  # min_Cprimary:  0.76
  # act_Cprimary:  0.81
fi

final.ext.raw : extracted version of sre16 model using bellow command:

nnet3-copy --nnet-config=extract.config final.raw final.ext.raw

mfcc.conf MFCC config file
plda_adapt.smooth0.1 : smoothed version of PLDA
spk_xvectors.ark : trained speaker's xvector archive file
vad.conf : VAD config file
mean.vec : mean vector
num_utts.ark : number of utterances associated to each speaker
README.md : README file
transform.mat : transformation matrix

Any comment and enhancement will be happily accepted.

…at>*’ fixed.

gooran

Well done!

gooran · 2021-07-13T08:29:14Z

src/kaldi_recognizer.cc

@@ -397,7 +397,8 @@ bool KaldiRecognizer::GetSpkVector(Vector<BaseFloat> &out_xvector, int *num_spk_
    //  xvector_result is filled with xvector for PldaScoring process
    xvector_result = xvector;
    //  out_xvector will be filled by PldaScoring method from utterance
-    //      xvector after transformation
+    //      xvector before transformation so that it can be used for new
+    //      users enrollment
    PldaScoring(out_xvector);


Why you get this out_xvector? only for enrollment? I guess that maybe it's better to add a specific method for this task.

out_xvector is passed as reference to this function and it will be filled to sending to user as spk field.
Yes it can be used for enrollment and it should be added to spk_xvectors.ark as ark format.
Defining new function will make some redundancy computing xvector again while we compute this values in PldaScoring once.

It may be best to review the entire speaker enrollment and scoring scenario once. Using a function for two purposes is not interesting.

gooran

I think it is better to distinguish between speaker recognition and speech recognition task in the general structure. It may be best to have two separate recognizer modules for these two tasks.

sadrasabouri added 24 commits May 22, 2021 18:57

add : PldaScoring added.

b5f37b7

add : HashType definition added.

3dafe39

add : xvector_result added.

8956a55

add : spk_model added.

132860c

add : ivector/pda.h imported.

a95c92c

add : scores_ added.

c7f3215

add : VAD config added.

dbf3f72

add : PLDA file added.

8d863a8

add : plda added to speaker_model.

4c16c32

add : ivector/voice-activity-detection.h added.

a344976

edit : final edits on spk_model.

a5f7457

Merge remote-tracking branch 'origin/master' into PldaScoring

0bfb051

Merge remote-tracking branch 'origin/master' into PldaScoring

b1c897b

fix : string --> std::string.

ed83fd3

add : vosk_recognizer_plda_scoring added to vosk_api.

9641b0b

add : PldaScoring added to __init__.py.

149ff7a

Merge remote-tracking branch 'origin/master' into PldaScoring

09b43d8

fix : PldaScoring moved from private to public.

2a97540

change : PLDA changed to private class method.

389db32

Merge remote-tracking branch 'origin/master' into PldaScoring

cbd2cc9

edit : plda and model name edited.

3fd491a

update : kaldi_rec updated accordingly.

85f51cb

add : scores json section added.

a7a62fb

fix : typo fixed.

584c77a

sadrasabouri marked this pull request as ready for review June 9, 2021 19:15

sadrasabouri marked this pull request as draft June 9, 2021 19:17

sadrasabouri added 4 commits June 12, 2021 11:03

edit : minor edit in GetSpkVector.

cac85bb

Merge remote-tracking branch 'origin/master' into PldaScoring

b7d56ea

test : AddVec Remove.

28fe2cd

test : minor test on AddVec.

0687939

sadrasabouri added 15 commits July 12, 2021 19:20

edit : AddMatVec added for test.

f146af3

remove : extra log removed.

efc2b6a

remove : spk removed for tests.

c13d90a

edit : test_speaker.py edited.

989110c

add : go files added.

f275cd8

[MERGE] : python test merged.

a4aa072

[MERGE] : atomic merged.

e0b73fa

[MERGE] : atomic merged.

2f888d0

add : spk vector from PldaScoring function.

b3265fe

fix : operand types are ‘kaldi::Vector<float>’ and ‘kaldi::Vector<flo…

cbd4817

…at>*’ fixed.

update : test_speaker.py updated.

0ce7e25

edit : new spk_sig (dim=128) replaced.

9107257

remove : vad file removed.

6f8e9fc

readd : vad readded.

bb8b4bb

remove : test.py removed.

e9518ff

sadrasabouri marked this pull request as ready for review July 12, 2021 19:00

sadrasabouri changed the title ~~Plda Scoring and VAD added~~ Plda Scoring Jul 12, 2021

sadrasabouri mentioned this pull request Jul 13, 2021

Shenasa PLDA sadrasabouri/vosk-api#1

Merged

sadrasabouri added 4 commits July 13, 2021 09:59

remove : vad_opts removed.

ffb2a14

readd : vad_opts added.

d72a5fe

add : comments added.

bfd086f

edit : minor edits and new comments.

0b80a62

gooran reviewed Jul 13, 2021

View reviewed changes

sadrasabouri added 2 commits July 13, 2021 12:45

edit : xvectors now are calculated before transformation.

ade0121

change : xvector changed.

a92d208

gooran reviewed Jul 13, 2021

View reviewed changes

sadrasabouri mentioned this pull request Sep 3, 2021

Some knowledge gaps in the documentation VOSK #674

Closed

Merge remote-tracking branch 'origin/master' into PldaScoring

f9921af

nshmyrev force-pushed the master branch from a490f35 to b090341 Compare January 21, 2022 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plda Scoring #548

Plda Scoring #548

sadrasabouri commented May 22, 2021 •

edited

gooran left a comment

gooran Jul 13, 2021

sadrasabouri Jul 13, 2021

gooran Jul 13, 2021

gooran left a comment

Plda Scoring #548

Are you sure you want to change the base?

Plda Scoring #548

Conversation

sadrasabouri commented May 22, 2021 • edited

ADDED:

gooran left a comment

Choose a reason for hiding this comment

gooran Jul 13, 2021

Choose a reason for hiding this comment

sadrasabouri Jul 13, 2021

Choose a reason for hiding this comment

gooran Jul 13, 2021

Choose a reason for hiding this comment

gooran left a comment

Choose a reason for hiding this comment

sadrasabouri commented May 22, 2021 •

edited