You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an issue while running the notebook with the msdd_model.diarize() method:
[NeMo I 2023-11-24 10:01:36 msdd_models:1092] Loading pretrained diar_msdd_telephonic model from NGC
[NeMo I 2023-11-24 10:01:36 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.20.0/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo.
[NeMo I 2023-11-24 10:01:36 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.20.0/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo
[NeMo I 2023-11-24 10:01:36 common:913] Instantiating model from pre-trained checkpoint
[NeMo W 2023-11-24 10:01:38 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: true
[NeMo W 2023-11-24 10:01:38 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: false
[NeMo W 2023-11-24 10:01:38 modelPT:174] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: false
seq_eval_mode: false
[NeMo I 2023-11-24 10:01:38 features:289] PADDING: 16
[NeMo I 2023-11-24 10:01:38 features:289] PADDING: 16
[NeMo I 2023-11-24 10:01:39 save_restore_connector:249] Model EncDecDiarLabelModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.20.0/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo.
[NeMo I 2023-11-24 10:01:39 features:289] PADDING: 16
[NeMo I 2023-11-24 10:01:40 clustering_diarizer:127] Loading pretrained vad_multilingual_marblenet model from NGC
[NeMo I 2023-11-24 10:01:40 cloud:58] Found existing object /root/.cache/torch/NeMo/NeMo_1.20.0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.
[NeMo I 2023-11-24 10:01:40 cloud:64] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.20.0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo
[NeMo I 2023-11-24 10:01:40 common:913] Instantiating model from pre-trained checkpoint
[NeMo W 2023-11-24 10:01:40 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: /manifests/ami_train_0.63.json,/manifests/freesound_background_train.json,/manifests/freesound_laughter_train.json,/manifests/fisher_2004_background.json,/manifests/fisher_2004_speech_sampled.json,/manifests/google_train_manifest.json,/manifests/icsi_all_0.63.json,/manifests/musan_freesound_train.json,/manifests/musan_music_train.json,/manifests/musan_soundbible_train.json,/manifests/mandarin_train_sample.json,/manifests/german_train_sample.json,/manifests/spanish_train_sample.json,/manifests/french_train_sample.json,/manifests/russian_train_sample.json
sample_rate: 16000
labels:
- background
- speech
batch_size: 256
shuffle: true
is_tarred: false
tarred_audio_filepaths: null
tarred_shard_strategy: scatter
augmentor:
shift:
prob: 0.5
min_shift_ms: -10.0
max_shift_ms: 10.0
white_noise:
prob: 0.5
min_level: -90
max_level: -46
norm: true
noise:
prob: 0.5
manifest_path: /manifests/noise_0_1_musan_fs.json
min_snr_db: 0
max_snr_db: 30
max_gain_db: 300.0
norm: true
gain:
prob: 0.5
min_gain_dbfs: -10.0
max_gain_dbfs: 10.0
norm: true
num_workers: 16
pin_memory: true
[NeMo W 2023-11-24 10:01:40 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: /manifests/ami_dev_0.63.json,/manifests/freesound_background_dev.json,/manifests/freesound_laughter_dev.json,/manifests/ch120_moved_0.63.json,/manifests/fisher_2005_500_speech_sampled.json,/manifests/google_dev_manifest.json,/manifests/musan_music_dev.json,/manifests/mandarin_dev.json,/manifests/german_dev.json,/manifests/spanish_dev.json,/manifests/french_dev.json,/manifests/russian_dev.json
sample_rate: 16000
labels:
- background
- speech
batch_size: 256
shuffle: false
val_loss_idx: 0
num_workers: 16
pin_memory: true
[NeMo W 2023-11-24 10:01:40 modelPT:174] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
sample_rate: 16000
labels:
- background
- speech
batch_size: 128
shuffle: false
test_loss_idx: 0
[NeMo I 2023-11-24 10:01:40 features:289] PADDING: 16
[NeMo I 2023-11-24 10:01:40 save_restore_connector:249] Model EncDecClassificationModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.20.0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.
[NeMo I 2023-11-24 10:01:40 msdd_models:864] Multiscale Weights: [1, 1, 1, 1, 1, 1]
[NeMo I 2023-11-24 10:01:40 msdd_models:865] Clustering Parameters: {
"oracle_num_speakers": false,
"max_num_speakers": 8,
"enhanced_count_thres": 80,
"max_rp_threshold": 0.25,
"sparse_search_volume": 30,
"maj_vote_spk_count": false,
"chunk_cluster_count": 50,
"embeddings_per_chunk": 10000
}
[NeMo W 2023-11-24 10:01:40 clustering_diarizer:411] Deleting previous clustering diarizer outputs.
[NeMo I 2023-11-24 10:01:40 speaker_utils:93] Number of files to diarize: 1
[NeMo I 2023-11-24 10:01:40 clustering_diarizer:309] Split long audio file to avoid CUDA memory issue
splitting manifest: 100%|██████████| 1/1 [00:00<00:00, 1.88it/s][NeMo I 2023-11-24 10:01:41 vad_utils:107] The prepared manifest file exists. Overwriting!
[NeMo I 2023-11-24 10:01:41 classification_models:272] Perform streaming frame-level VAD
[NeMo I 2023-11-24 10:01:41 collections:301] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2023-11-24 10:01:41 collections:302] Dataset loaded with 12 items, total duration of 0.16 hours.
[NeMo I 2023-11-24 10:01:41 collections:304] # 12 files loaded accounting to # 1 labels
vad: 100%|██████████| 12/12 [00:04<00:00, 2.46it/s][NeMo I 2023-11-24 10:01:46 clustering_diarizer:262] Converting frame level prediction to speech/no-speech segment in start and end times format.
creating speech segments: 100%|██████████| 1/1 [00:00<00:00, 2.43it/s][NeMo I 2023-11-24 10:01:46 clustering_diarizer:287] Subsegmentation for embedding extraction: scale0, /content/temp_outputs/speaker_outputs/subsegments_scale0.json
[NeMo I 2023-11-24 10:01:46 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2023-11-24 10:01:46 collections:301] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2023-11-24 10:01:46 collections:302] Dataset loaded with 381 items, total duration of 0.32 hours.
[NeMo I 2023-11-24 10:01:46 collections:304] # 381 files loaded accounting to # 1 labels
[1/6] extract embeddings: 100%|██████████| 6/6 [00:01<00:00, 3.13it/s][NeMo I 2023-11-24 10:01:48 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings
[NeMo I 2023-11-24 10:01:48 clustering_diarizer:287] Subsegmentation for embedding extraction: scale1, /content/temp_outputs/speaker_outputs/subsegments_scale1.json
[NeMo I 2023-11-24 10:01:48 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2023-11-24 10:01:48 collections:301] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2023-11-24 10:01:48 collections:302] Dataset loaded with 457 items, total duration of 0.32 hours.
[NeMo I 2023-11-24 10:01:48 collections:304] # 457 files loaded accounting to # 1 labels
[2/6] extract embeddings: 100%|██████████| 8/8 [00:02<00:00, 3.86it/s][NeMo I 2023-11-24 10:01:50 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings
[NeMo I 2023-11-24 10:01:50 clustering_diarizer:287] Subsegmentation for embedding extraction: scale2, /content/temp_outputs/speaker_outputs/subsegments_scale2.json
[NeMo I 2023-11-24 10:01:50 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2023-11-24 10:01:50 collections:301] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2023-11-24 10:01:50 collections:302] Dataset loaded with 574 items, total duration of 0.32 hours.
[NeMo I 2023-11-24 10:01:50 collections:304] # 574 files loaded accounting to # 1 labels
[3/6] extract embeddings: 100%|██████████| 9/9 [00:02<00:00, 3.30it/s][NeMo I 2023-11-24 10:01:53 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings
[NeMo I 2023-11-24 10:01:53 clustering_diarizer:287] Subsegmentation for embedding extraction: scale3, /content/temp_outputs/speaker_outputs/subsegments_scale3.json
[NeMo I 2023-11-24 10:01:53 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2023-11-24 10:01:53 collections:301] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2023-11-24 10:01:53 collections:302] Dataset loaded with 764 items, total duration of 0.32 hours.
[NeMo I 2023-11-24 10:01:53 collections:304] # 764 files loaded accounting to # 1 labels
[4/6] extract embeddings: 100%|██████████| 12/12 [00:03<00:00, 3.92it/s][NeMo I 2023-11-24 10:01:56 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings
[NeMo I 2023-11-24 10:01:56 clustering_diarizer:287] Subsegmentation for embedding extraction: scale4, /content/temp_outputs/speaker_outputs/subsegments_scale4.json
[NeMo I 2023-11-24 10:01:56 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2023-11-24 10:01:56 collections:301] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2023-11-24 10:01:56 collections:302] Dataset loaded with 1148 items, total duration of 0.32 hours.
[NeMo I 2023-11-24 10:01:56 collections:304] # 1148 files loaded accounting to # 1 labels
[5/6] extract embeddings: 100%|██████████| 18/18 [00:03<00:00, 4.69it/s][NeMo I 2023-11-24 10:02:00 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings
[NeMo I 2023-11-24 10:02:00 clustering_diarizer:287] Subsegmentation for embedding extraction: scale5, /content/temp_outputs/speaker_outputs/subsegments_scale5.json
[NeMo I 2023-11-24 10:02:00 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2023-11-24 10:02:00 collections:301] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2023-11-24 10:02:00 collections:302] Dataset loaded with 2296 items, total duration of 0.32 hours.
[NeMo I 2023-11-24 10:02:00 collections:304] # 2296 files loaded accounting to # 1 labels
[6/6] extract embeddings: 100%|██████████| 36/36 [00:05<00:00, 6.98it/s]
[NeMo I 2023-11-24 10:02:06 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings
clustering: 100%|██████████| 1/1 [00:00<00:00, 1.37it/s][NeMo I 2023-11-24 10:02:06 clustering_diarizer:464] Outputs are saved in /content/temp_outputs directory
[NeMo W 2023-11-24 10:02:06 der:185] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2023-11-24 10:02:07 msdd_models:960] Loading embedding pickle file of scale:0 at /content/temp_outputs/speaker_outputs/embeddings/subsegments_scale0_embeddings.pkl
[NeMo I 2023-11-24 10:02:07 msdd_models:960] Loading embedding pickle file of scale:1 at /content/temp_outputs/speaker_outputs/embeddings/subsegments_scale1_embeddings.pkl
[NeMo I 2023-11-24 10:02:07 msdd_models:960] Loading embedding pickle file of scale:2 at /content/temp_outputs/speaker_outputs/embeddings/subsegments_scale2_embeddings.pkl
[NeMo I 2023-11-24 10:02:07 msdd_models:960] Loading embedding pickle file of scale:3 at /content/temp_outputs/speaker_outputs/embeddings/subsegments_scale3_embeddings.pkl
[NeMo I 2023-11-24 10:02:07 msdd_models:960] Loading embedding pickle file of scale:4 at /content/temp_outputs/speaker_outputs/embeddings/subsegments_scale4_embeddings.pkl
[NeMo I 2023-11-24 10:02:07 msdd_models:960] Loading embedding pickle file of scale:5 at /content/temp_outputs/speaker_outputs/embeddings/subsegments_scale5_embeddings.pkl
[NeMo I 2023-11-24 10:02:07 msdd_models:938] Loading cluster label file from /content/temp_outputs/speaker_outputs/subsegments_scale5_cluster.label
[NeMo I 2023-11-24 10:02:07 collections:617] Filtered duration for loading collection is 0.000000.
[NeMo I 2023-11-24 10:02:07 collections:620] Total 3 session files loaded accounting to # 3 audio clips
0%| | 0/1 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-13-8cafa8c83657>](https://localhost:8080/#) in <cell line: 3>()
1 # Initialize NeMo MSDD diarization model
2 msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("cuda")
----> 3 msdd_model.diarize()
4
5 del msdd_model
12 frames
[/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/modules/msdd_diarizer.py](https://localhost:8080/#) in conv_forward(self, conv_input, conv_module, bn_module, first_layer)
417 conv_out = conv_module(conv_input)
418 conv_out = conv_out.permute(0, 2, 1, 3) if not first_layer else conv_out
--> 419 conv_out = conv_out.reshape(self.batch_size, self.length, self.cnn_output_ch, self.emb_dim)
420 conv_out = conv_out.unsqueeze(2).flatten(0, 1)
421 conv_out = bn_module(conv_out.permute(0, 3, 2, 1)).permute(0, 3, 2, 1)
RuntimeError: shape '[138, 50, 16, 192]' is invalid for input of size 84787200
Do you have any hint on how to solve this issue ?
The text was updated successfully, but these errors were encountered:
Hello,
I have an issue while running the notebook with the msdd_model.diarize() method:
Do you have any hint on how to solve this issue ?
The text was updated successfully, but these errors were encountered: