Difference between diarize_paralell.py and diarize.py #71

filmo · 2023-08-02T20:31:09Z

filmo
Aug 2, 2023

I'm noticing quite a difference in sentence segmentation and thus capitalization when running the same audio through diarize.py vs diarize_parallel.py

Single process

And this has been like almost a year ago, so I'm kind of going back in history, so bear with me. I had questions about, I don't know, it seemed to me that cheating days were okay. So I would be following the diet, but if I wanted

versus

parallel process

And this has been like almost a year ago. So it's I'm kind of going back in history. So bear with me. I had questions about I don't know, it seemed to me that that cheating days were okay. So I would be following the diet. But if I wanted

Any thoughts on to why this would be the case?? I had assumed they would produce very similar results, but pretty consistently the parallel process ends up fragmenting the speech into many more small sentences.

To what extent does random sampling effect the performance of both processes (either run serially or split into parallel processes)

I am just getting started with Whisper and diarization. My preference is 'accuracy' over 'time'. If there are particular settings like beam search, temperature, to k_top/p_top that can be adjusted to increase performance even if it takes 2 or 3 times longer, please point me in the general direction. (It seems like the current repos is set up for using a fair bit of the defaults from the various integrated packages, but I'm willing to dig in, modify and integrate additional args.

Thank you for putting together this repo.

MahmoudAshraf97 · 2023-08-02T21:22:20Z

MahmoudAshraf97
Aug 2, 2023
Maintainer

Hi, the parallelization only separates timestamp generation from the rest of the process as they are not dependent on each other, which means that the difference is resulting from whisper randomness in generation and that is the same in both parallel and serial processing, I'd suggest playing with the temperature and other randomness related arguments in faster_whisper inference part

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between diarize_paralell.py and diarize.py #71

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Difference between diarize_paralell.py and diarize.py #71

filmo Aug 2, 2023

Single process

parallel process

Replies: 1 comment

MahmoudAshraf97 Aug 2, 2023 Maintainer

filmo
Aug 2, 2023

MahmoudAshraf97
Aug 2, 2023
Maintainer