26 Feb 20:54

c64f254

v1.0.0 Latest

Latest

Please, help our community project. Star on GitHub!

🚀 What's New in SpeechBrain 1.0?

📅 On February 2024, we released SpeechBrain 1.0, the result of a year-long collaborative effort by a large international network of developers led by our exceptional core development team.

📊 Some Numbers:

SpeechBrain has evolved into a significant project and stands among the most widely used open-source toolkits for speech processing.
Over 140 developers have contributed to our repository, getting more than 7.3k stars on GitHub.
Monthly downloads from PyPI have reached an impressive 200k.
Expanded to over 200 recipes for Conversational AI, featuring more than 100 pretrained models on HuggingFace.

🌟 Key Updates:

SpeechBrain 1.0 introduces significant advancements, expanding support for diverse datasets and tasks, including NLP and EEG processing.
The toolkit now excels in Conversational AI and various sequence processing applications.
Improvements encompass key techniques in speech recognition, streamable conformer transducers, integration with K2 for Finite State Transducers, CTC decoding and n-gram rescoring, new CTC/joint attention Beam Search interface, enhanced compatibility with HuggingFace Models (including GPT2 and Llama2), and refined data augmentation, training, and inference processes.
We have created a new repository dedicated to benchmarks, accessible at here. At present, this repository features benchmarks for various domains, including speech self-supervised models (MP3S), continual learning (CL-MASR), and EEG processing (SpeechBrain-MOABB).

For detailed technical information, please refer to the section below.

🔄 Breaking Changes

People familiar with SpeechBrain know very well that we do our best to avoid backward incompatible changes. While SpeechBrain has consistently prioritized maintaining backward compatibility, the introduction of this new major version presented an opportunity for significant enhancements and refactorings.

🤗 HuggingFace Interface Refactor:
- Previously, our interfaces were limited to specific models like Whisper, HuBERT, WavLM, and wav2vec 2.0.
- We've refactored the interface to be more general, now supporting any transformer model from HuggingFace including LLMs.
- Simply inherit from our new interface and enjoy the flexibility.
- The updated interfaces can be accessed here.
🔍 BeamSearch Refactor:
- The previous beam search interface, while functional, was challenging to comprehend and modify due to the combined search and rescoring parts.
- We've introduced a new interface where scoring and search are separated, managed by distinct functions, resulting in simpler and more readable code.
- This update allows users to easily incorporate various scorers, including n-gram LM and custom heuristics, in the search part.
- Additionally, support for pure CTC training and decoding, batch and GPU decoding, partial or full candidate scoring, and N-best hypothesis output with neural LM rescorers has been added.
- An interface to K2 for search based on Finite State Transducers (FST) is now available.
- The updated decoders are available here.
🎨 Data Augmentation Refactor:
- The data augmentation capabilities have been enhanced, offering users access to various functions in speechbrain/augment.
- New techniques, such as CodecAugment, RandomShift (Time), RandomShift (Frequency), DoClip, RandAmp, ChannelDrop, ChannelSwap, CutCat, and DropBitResolution, have been introduced.
- Augmentation can now be customized and combined using the Augmenter interface in speechbrain/augment/augmenter.py, providing more control during training.
- Take a look here for a tutorial on speech augmentation.
- The updated augmenters are available here.
🧠 Brain Class Refactor:
- The fit_batch method in the Brain Class has been refactored to minimize the need for overrides in training scripts.
- Native support for different precisions (fp32, fp16, bf16), mixed precision, compilation, multiple optimizers, and improved multi-GPU training with torchrun is now available.
- Take a look at the refactored brain class here.
🔍 Inference Interfaces Refactor:
- Inference interfaces, once stored in a single file (speechbrain/pretrained/interfaces.py), are now organized into smaller libraries in speechbrain/inference, enhancing clarity and intuitiveness.
- You can access the new inference interfaces here.

🔊 Automatic Speech Recognition

Developed a new recipe for training a Streamable Conformer Transducer using Librispeech dataset (accessible here). The streamable model achieves a Word Error Rate (WER) of 2.72% on the test-clean subset.
Implemented a dedicated inference inference to support streamable ASR (accessible here).
New models, including HyperConformer andd Branchformer have been introduced. Examples of recipes utilizing them can be found here.
Additional support for datasets like RescueSpeech, CommonVoice 14.0, AMI, Tedlium 2.
The ASR search pipeline has undergone a complete refactoring and enhancement (see comment above).
A new recipe for Bayesian ASR has been added here.

🔄 Interface with Kaldi2 (K2-FSA)

Integration of an interface that seamlessly connects SpeechBrain with K2-FSA, allowing for constrained search and more.
Support for K2 CTC training and lexicon decoding, along with integration of K2 HLG and n-gram rescoring.
Competitive results achieved with Wav2vec2 on LibriSpeech test sets.
Explore an example recipe utilizing K2 here.

🎙 Speech Synthesis (TTS)

Improvements to FastSpeech2.
Development of the DiffWave Vocoder. See the recipe here.
Development of a Zero-Shot TTS baseline based on Tacotron. See the recipe here.

🌐 Speech-to-Speech Translation:

Introduction of new recipes for CVSS datasets and IWSLT 2022 Low-resource Task, based on mBART/NLLB and SAMU wav2vec.

🌟 Speech Generation

Implementation of diffusion and latent diffusion techniques with an example recipe showcased on AudioMNIST.

🎧 Interpretability of Audio Signals

Implementation of Learning to Interpret and PIQ techniques with example recipes demonstrated on ECS50.

😊 Speech Emotion Diarization

Support for Speech Emotion Diarization, featuring an example recipe on the Zaion Emotion Dataset. See the training recipe here.

🎙️ Speaker Recognition

Introduction of a new Speaker Recognition recipe on Voxceleb Speaker, based on ResNET.

🔊 Speech Enhancement

Release of a new Speech Enhancement baseline based on the DNS dataset.

🎵 Discrete Audio Representations

Support for pretrained models with discrete audio representations, including EnCodec and DAC.
Support for discretization of continuous represen...

Contributors

ywk991112, gaspardpetit, and 4 other contributors

Assets 2

22 Nov 02:28

github-actions

v0.5.16

65c0113

v0.5.16

SpeechBrain 0.5.16 will be the last minor version of SpeechBrain before the major release of SpeechBrain 1.0.

In this minor version, we have focused on refining the existing features without introducing any interface changes, ensuring a seamless transition to SpeechBrain 1.0 where backward incompatible modifications will take place.

Key Highlights of SpeechBrain 0.5.16:

Bug Fixes: Numerous small fixes have been implemented to enhance the overall stability and performance of SpeechBrain.

Testing and Documentation: We have dedicated efforts to improve our testing infrastructure and documentation, ensuring a more robust and user-friendly experience.

Expanded Model and Dataset Support: SpeechBrain 0.5.16 introduces support for several new models and datasets, enhancing the versatility of the platform. For a detailed list, please refer to the commits below.

Stay informed and get ready for the groundbreaking SpeechBrain 1.0, where we will unveil substantial changes and exciting new features.

Thank you for being a part of the SpeechBrain community!

Commits

[cea36b4]: Update README.md (Mirco Ravanelli) #1599
[cead130]: Updated README.md (prometheus) #975
[779c620]: Update README.md (Mirco Ravanelli) #2124
[32af2ac]: update requirement (to avoid deprecation error) (Mirco Ravanelli) #975
[b039df1]: small fixes (Mirco Ravanelli) #975
[07e7c73]: small fixes (Mirco Ravanelli) #975
[dac6842]: Update README.md (Mirco Ravanelli) #975
[75f4c66]: Update README.md (Mirco Ravanelli) #975
[327a3f5]: Fixed SSVEP yaml file (prometheus) #975
[067d94e]: Fixed conflicts (prometheus) #975
[331741d]: Fixed read/write conflicts mne config file when training many models in parallel (prometheus) #975
[0f25d5b]: Added hparam files for other architectures (prometheus) #975
[9ba76e3]: Updated LMDA, forcing odd kernel size in depth attention (prometheus) #975
[6336200]: Fixed activation in LMDA (prometheus) #975
[1593cc4]: Fixed issue in deepconvnet (prometheus) #975
[2f0f5f0]: Fixed issue with shallowconvnet (prometheus) #975
[8f70136]: Fixed issue with lmda (prometheus) #975
[ac4f9e4]: Merge remote-tracking branch 'origin/develop' into fixeval (Adel Moumen) #2123
[cdce80c]: fix ddp issue with loading a key (Adel Moumen) #2128
[66633a0]: Added template yaml files (prometheus) #975
[6f631a7]: minor additions for tests (pradnya-git-dev) #2120
[331acdb]: add notes on tests with non-default gpu (Mirco Ravanelli) #2130
[091b3ce]: fixed hard-coded device (Mirco Ravanelli) #2130
[cc72c9e]: fixed hard-coded device (Mirco Ravanelli) #2130
[c60e606]: fixed hard-coded device (Mirco Ravanelli) #2130
[253859e]: Resolve paths so relative works too (Aku Rouhe) #2128
[8a98401]: small fix on orion flag (Mirco Ravanelli) #975
[7da9a95]: extend fix to all files (Mirco Ravanelli) #975
[4b09ff2]: fix style (Mirco Ravanelli) #975
[ced2922]: Merge remote-tracking branch 'upstream/develop' into eeg_decoding (Mirco Ravanelli) #975
[5e070a2]: fix useless file (Mirco Ravanelli) #975
[46565cf]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into develop (xuechenliu) #2142
[19235f2]: Merge remote-tracking branch 'upstream/Adel-Moumen-revert_commit_ddp' into revert_commit_ddp (Adel Moumen) #2128
[2fb247f]: Save the checkpoint folder and meta only on the main process and communicate to all procs (Peter Plantinga) #2132
[f37d433]: Only broadcast checkpoint folder if distributed (Peter Plantinga) #2132
[e23da7d]: Initialize external loggers only on main process (Peter Plantinga) #2134
[67b1255]: fixes (BenoitWang) #2119
[70d8901]: Merge branch 'develop' into fs2_internal_alignment (Yingzhi WANG) #2119
[5565073]: Add file check on all recipe tests (#2126) (Mirco Ravanelli) #2126
[76923a4]: removeused varibles, add exception types (BenoitWang) #2119
[[0a18729](https://github.com/speechbrain/speechbrain/commit/0a187291fa95323929ac...

Assets 2

22 Jul 18:07

github-actions

v0.5.15

adb34db

v0.5.15

SpeechBrain 0.5.15 Release Notes

We are thrilled to announce the release of SpeechBrain version 0.5.15! This new version represents a significant step forward for our open-source Conversational AI toolkit. The core team, along with a rapidly growing network of contributors, has worked diligently to enhance and expand the toolkit while addressing various issues.

What's New?

This release marks a crucial point as it will likely be the final minor version before the highly anticipated SpeechBrain 1.0, scheduled for release in the coming months. We have achieved notable milestones in this version, and a summary of the key achievements is presented below. For a comprehensive list of all changes, please refer to the detailed notes at the end.

Notable Achievements

Benchmark Repository:
We are proud to introduce the benchmark repository, which aims to provide a standard recipe for researchers to benchmark and compare different techniques and models. Currently, the following benchmarks are available:
- CLMASR: Evaluates continual learning techniques for speech recognition in new languages.
- MP3S Benchmarks: Assesses speech self-supervised representation across various tasks and with different downstream models (multi-probe).
Enhanced User Experience:
We've made it more convenient for our users to access logs and checkpoints by migrating the logs and output folders from Gdrive to Dropbox.
New Models with Improved Performance:
We implemented a modified Fastspeech 2.0. This offers efficiency and pretty high performance. We've made significant steps in enhancing performance on Librispeech, thanks to the implementation of better Conformers and Branchformers. Additionally, we've introduced a performant Conformer Transducer and the SLI-GRU model.
Post-doc Interpretability Techniques:
We now offer improved support for post-doc interpretability techniques. Refer to the ESC50 recipe for more information.
New Datasets:
We've incorporated recipes for new datasets, including the recently released RescueSpeech (speech recognition in rescue and domain environments) and the Zaion Emotion Dataset for Speech Emotion Recognition.
Enhanced Korean ASR:
We've made improvements to KsponSpeech for Korean Automatic Speech Recognition.
Improved Recipe Tests:
We've taken steps to enhance recipe tests, ensuring better reliability and performance.
** Whisper Fixes**:
We've fixed Whisper recipes and interfaces in a way that maintains backward compatibility. This was necessary to address interface changes made in the original model.
Various Fixes:
In addition to the above achievements, we've addressed several other issues, including gradient accumulation and various minor fixes.

Thank you to our dedicated community of contributors and users for making this release possible! We invite you to explore the new features and improvements in SpeechBrain 0.5.15 and look forward to the upcoming release of SpeechBrain 1.0. Happy SpeechBrain-ing!

For a complete list of changes, please refer to the detailed release notes below.

Commits

[4aff580]: update new model at 2.8 (Titouan Parcollet) #1782
[2c75924]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into libri_conformer_large (Titouan Parcollet) #1782
[a6d5d54]: add testing (Titouan Parcollet) #1782
[6cb2b25]: push new results (Titouan Parcollet) #1782
[fefdcfb]: Merge branch 'develop' into fastspeech2 (Mirco Ravanelli) #1572
[9937302]: download automatically allignments (Mirco Ravanelli) #1572
[9b9c6c2]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2017
[555de27]: add hf, dropbox, recipes.csv (Sangeet Sagar) #2017
[9447794]: fix pre-commit conflicts (Sangeet Sagar) #2017
[6707d09]: add field- noise_wav (Sangeet Sagar) #2017
[c604b89]: only kept the WeightedSSLModel class in this PR (salah-zaiem) #2047
[61d4af1]: removed useless changes (salah-zaiem) #2047
[042fd1e]: Update README.md (Mirco Ravanelli) #2017
[cdda921]: Fix with the solution used by Luca in the Benchmarks (Mirco Ravanelli) #2016
[07e2458]: Update README.md (Mirco Ravanelli) #2017
[ce53199]: Fix backward compatibility issues (Luca Della Libera) #2016
[9eadc9e]: add transformers (Sangeet Sagar) #2017
[3f1959f]: add clarity to training data; add computing power details (Sangeet Sagar) #2017
[a8265bf]: Update extra_requirements.txt (Parcollet Titouan)
[2905a52]: unknown token fix (pradnya-git-dev) #1572
[2d5b40f]: Update README.md (Adel Moumen) #2064
[d6bc4a8]: manage paths and reduce SOX warnings (BenoitWang) #2048
[672af7a]: fixes (BenoitWang) #2048
[045d2ed]: info of input/output, add links, add to recipes (BenoitWang) #2048
[9f841f1]: Update README.md (Mirco Ravanelli) #2048
[379772a]: Update ZaionEmotionDataset.csv (Mirco Ravanelli) #2048
[9cde9cc]: small fix (Mirco Ravanelli) #1572
[c14048b]: add run time and an interface (BenoitWang) #2048
[6d0d6ab]: merge fix (BenoitWang) #2048
[5a9df26]: fix (BenoitWang) #2048
[6a5237d]: adjusting n_symbols for unknown tokens (pradnya-git-dev) [#1572](https://github.com/speechbrain/speech...

Assets 2

24 Mar 17:40

TParcollet

v0.5.14

97dbeb3

SpeechBrain v0.5.14

This release is a minor yet important release. It increases significantly the number of features available while fixing quite a lot of small bugs and issues. A summary of the achievements of this release is given below, while a complete detailed list of all the changes can be found at the bottom of this release note.

Notable achievements

22 new contributors, thank you so much, everyone!
31 new recipes (ASR, SLU, AST, AER, Interpretability, SSL).
FULL automatic recipe testing.
Increased coverage for the continuous integration over the code, URLs, YAML, recipes, and HuggingFace models.
New Conformer Large model for ASR.
Integration of Whisper for fine-tuning or inference.
Full pre-training of wav2vec2 entirely re-implemented AND documented.
Low resource Speech Translation with IWSLT.
Many other novelties... see below.

What's Changed

fix 1522 by @anautsch in #1526
bug-fix: fixed OPEN_RIR data preparation process conflict. by @xin-w8023 in #1536
add noise and reverberance version for BinauralWSJ0Mix by @huangzj421 in #1502
fix distributed namespace by @anautsch in #1566
feat: use member field instead of hard-code by @xin-w8023 in #1567
Update logo to new version by @pplantinga in #1575
IWSLT 2022 speech translation recipe by @mzboito in #1475
Fix Issue #1277 timit recipe missing uppercase option by @Adel-Moumen in #1564
Update README.md by @qanastek in #1577
Output hiddens states from all the transformer layers of huggingface_wav2vec by @BenoitWang in #1570
Fix bugs of update_learning_rate by @wangxin22 in #1578
Fix to use output of unsqueeze() in Tacotron2 parse_decoder_outputs() by @jqug in #1525
wav2vec2 pretraining implemented with speechbrain by @RuABraun in #1312
In filter_ctc_output(), remove redundant filtering by @olvb in #1584
Fixed output_all_hiddens for hubert in huggingface_wav2vec by @gorinars in #1587
Fix return value of batch_evaluation for separation recipes by @z-wony in #1555
fix endless doctest despite no example by @anautsch in #1591
Fix documented min python version to 3.7 by @asumagic in #1595
Conformer separation by @ycemsubakan in #1519
Add CTC recipe to AISHELL-1 by @BenoitWang in #1576
Add templates for issues by @Adel-Moumen in #1588
Added workaround for CyclicLR saving by @Gastron in #1683
scikit-learn import and comment fix by @underdogliu in #1485
Adding recipe for HiFiGAN training using LibriTTS dataset by @pradnya-git-dev in #1621
fix LibriSpeech CTC pretrainer by @BenoitWang in #1594
wav2vec German model added by @sangeet2020 in #1557
issue 1615 typo fix by @sharmadhiraj86 in #1700
typo in TransformerASR.py by @Adel-Moumen in #1704
Causality in Conv2d by @fpaissan in #1608
Switchboard Recipe by @dwgnr in #1460
read_audio fixes and docs cleanup by @asumagic in #1592
Fix path flake8 in pre-commit by @Adel-Moumen in #1721
Added german_cleaners by @padmalcom in #1642
fixing issue 1707 by @TParcollet in #1728
explicit fetch args & download-only option by @anautsch in #1735
fix sorting bug by @anautsch in #1730
remove discussions references by @Adel-Moumen in #1737
Fix torchaudio mel_normalized for Tacotron2&HifiGAN by @BenoitWang in #1740
Whisper finetuning by @Adel-Moumen in #1717
loss must be avg when BS>1 when calling evaluate_batch() by @sangeet2020 in #1744
[FIX] Flush gradients and save memory for validation. by @MartinKocour in #1739
add coloring in tqdm progress bar by @sangeet2020 in #1573
Fix librispeech transformer recipe by @TParcollet in #1775
🖍️ improving type-hints in speechbrain/pretrained/interfaces.py by @jonasvdd in #1725
Enabling the retrieval of whisper's hidden states by @Hguimaraes in #1751
Added fix to use DDP with hifi_gan training on ljspeech by @padmalcom in #1781
Fix wav2vec2 masking by @TParcollet in #1799
fix #1794 by @Adel-Moumen in #1805
refactor: recipe testing CSVs by @anautsch in #1600
fix 1788 by @BenoitWang in #1842
fix docstring for pooling by @BenoitWang in #1843
Whisper finetunng common voice by @poonehmousavi in #1809
fixing the convtasnet causal=True bug by @ycemsubakan in #1851
Fix Whisper doc + improve max_decode_ratio by @Adel-Moumen in #1858
Rewrite multi-GPU documentation by @asumagic in #1861
SLU Media recipe by @GaelleLaperriere in #1172
edits for refactoring check tool by @anautsch in #1838
minor fixes for recipe testing by @anautsch in #1872
Fix Whisper avoid_if_longer_than never used by @Adel-Moumen in #1882
Starting a recipe for ESC50 by @ycemsubakan in #1605
Fix for #1886 by @anautsch in #1890
fix batch_to_right by @anthony-wss in #1884
Fixes for pre-release testing by @anautsch in #1895
Fix Conformer Instabilities and add Large Model by @TParcollet in #1892
Downsampling by @salah-zaiem in #1888
fix core.py bf16 by @Adel-Moumen in #1898
S2SGreedySearcher : Do not continue decoding when EOS token was generated for all samples from a batch by @Jeronymous in #1899
quick fixes before minor by @anautsch in #1896

New Contributors

@xin-w8023 made their first contribution in #1536
@mzboito made their first contribution in #1475
@qanastek made their first contribution in #1577
@wangxin22 made their first contribution in #1578
@jqug made their first contribution in #1525
@olvb made their first contribution in #1584
@gorinars made their first contribution in #1587
@z-wony made their first contribution in #1555
@asumagic made their first contribution in #1595
@pradnya-git-dev made their first contribution in #1621
@sangeet2020 made their first contribution in #1557
@sharmadhiraj86 made their first contribution in #1700
@fpaissan made their first contribution in #1608
@dwgnr made their first contribution in #1460
@padmalcom made their first contribution in #1642
@MartinKocour made their first contribution in #1739
@jonasvdd made their first contribution in https://github...

Contributors

jqug, anautsch, and 31 other contributors

Assets 2

29 Aug 16:25

github-actions

v0.5.13

d78c00e

SpeechBrain v0.5.13

This is a minor release with better dependency version specification. We note that SpeechBrain is compatible with PyTorch 1.12, and the updated package reflects this. See the issue linked next to each commit for more details about the corresponding changes.

Commit summary

[edb7714]: Adding no_sync and on_fit_batch_end method to core (Rudolf Arseni Braun) #1449
[07155e9]: G2P fixes (flexthink) #1473
[6602dab]: fix for #1469, minimal testing for profiling (anautsch) #1476
[abbfab9]: test clean-ups: passes linters; doctests; unit & integration tests; load-yaml on cpu (anautsch) #1487
[1a16b41]: fix ddp incorrect command (=) #1498
[0b0ec9d]: using no_sync() in fit_batch() of core.py (Rudolf Arseni Braun) #1449
[5c9b833]: Remove torch maximum compatible version (Peter Plantinga) #1504
[d0f4352]: remove limit for HF hub as it does not work with colab (Titouan) #1508
[b78f6f8]: Add revision to hub (Titouan) #1510
[2c491a4]: fix transducer loss inputs devices (Adel Moumen) #1511
[4972f76]: missing space in install command (pehonnet) #1512
[6bc72af]: Fixing shuffle argument for distributed sampler in core.py (Rudolf Arseni Braun) #1518
[df7acd9]: Added the link for example results (cem) #1523
[5bae6df]: add LinearWarmupScheduler (Ge Li) #1537
[2edd7ee]: updating scipy version in requirements.txt. (Nauman Dawalatabad) #1546

Assets 2

26 Jun 20:19

mravanelli

v0.5.12

fc15db4

SpeechBrain v0.5.12

Release Notes - SpeechBrain v0.5.12

We worked very hard and we are very happy to announce the new version of SpeechBrain!

SpeechBrain 0.5.12 significantly expands the toolkit without introducing any major interface changes. I would like to warmly thank the many contributors that made this possible.

The main changes are the following:

A) Text-to-Speech: We developed the first TTS system of SpeechBrain. You can find it here. The system relies on Tacotron2 + HiFiGAN (as vocoder). The models coupled with an easy-inference interface are available on HuggingFace.

B) Grapheme-to-Phoneme (G2P): We developed an advanced Grapheme-to-Phoneme. You can find the code here. The current version significantly outperforms our previous model.

C) Speech Separation:

We developed a novel version of the SepFormer called Resource-Efficient SepFormer (RE-Sepformer). The code is available here and the pre-trained model (with an easy inference interface) here.
We released a recipe for Binaural speech separation with WSJMix. See the code here.
We released a new recipe with the AIShell mix dataset. You can see the code here.

D) Speech Enhancement:

We released the SepFormer model for speech enhancement. the code is here, while the pre-trained model (with easy-inference interface) is here.
We implemented the WideResNet for speech enhancement and use it to mimic loss-based speech enhancement. The code is here and the pretrained model (with easy-inference interface) is here.

E) Feature Front-ends:

We now support LEAF filter banks. The code is here. You can find an example of a recipe using it here.
We now support SincConv multichannel (see code here).

F) Recipe Refactors:

We refactored the Voxceleb recipe and fix the normalization issues. See the new code here. We also made the EER computation method less memory demanding (see here).
We refactored the IEMOCAP recipe for emotion recognition. See the new code here.

G) Models for African Languages:
We now have recipes for the DVoice dataset. We currently support Darija, Swahili, Wolof, Fongbe, and Amharic. The code is available here. The pretrained model (coupled with an easy-inference interface) can be found on SpeechBrain-HuggingFace.

H) Profiler:
We implemented a model profiler that helps users while developing new models with SpeechBrain. The profiler outputs a bunch of potentially useful information, such as the real-time factors and many other details.
A tutorial is available here.

I) Tests:
We significantly improved the tests. In particular, we introduced the following tests: HF_repo tests, docstring checks, yaml-script consistency, recipe tests, and check URLs. This will helps us scale up the project.

L) Other improvements:

We now support the torchaudio RNNT loss*.
We improved the relative attention mechanism of the Conformer.
We updated the transformer for LibriSpeech. This improves the performance from WER= 2.46% to 2.26% on the test-clean. See the code here.
The Environmental corruption module can now support different sampling rates.
Minor fixes.

Assets 2

20 Dec 04:22

github-actions

v0.5.11

d6bfe13

SpeechBrain v0.5.11

Dear users,
We worked very hard, and we are very happy to announce the new version of SpeechBrain.
SpeechBrain 0.5.11 further expands the toolkit without introducing any major interface change.

The main changes are the following:

We implemented new recipes, such as:

VoxLingua 107 for language identification.
Sepformer for speech enhancement
MetricGAN-U for speech enhancement
SLURP with wav2vec for spoken language understanding.
REALM for speech separation with real data.
Korean Speech Recognition with KsponSpeech.
CommonVoice for German.
IEMOCAP for language emotion recognition using wav2vec.

Support for Dynamic batching with a Tutorial to help users familiarize themselves with it.
Support for wav2vec training within SpeechBrain.
Developed an interface with Orion for hyperparameter tuning with a Tutorial to help users familiarize themselves with it.
the torchaudio transducer loss is now supported. We also kept our numba implementation to help users customize the transducer loss part if needed.
Improved CTC-Segmentation
Fixed minor bugs and issues (e.g., fixed MVDR beamformer ).

Let me thank all the amazing contributors for this achievement.
Please, keep add a star to our project if you appreciate our effort for the community.
Together, we are growing very fast, and we have big plans for the future.

Stay Tuned!

Assets 2

11 Sep 22:34

mravanelli

0.5.10

29b3b74

SpeechBrain v0.5.10

This version mainly expands the functionalities of SpeechBrain without adding any backward incompatibilities.

New Recipes:

Language Identification with CommonLanguage
EEG signal processing with ERPCore
Speech translation with Fisher-Call Home
Emotion Recognition with IEMOCAP
Voice Activity Detection with LibriParty
ASR with LibriSpeech wav2vec (WER=1.9 on test-clean)
SpeechEnhancement with CoopNet
SpeechEnhancement with SEGAN
Speech Separation with LibriMix, WHAM, and WHAMR
Support for guided attention
Spoken Language Understanding with SLURP

Beyond that, we fixed some minor bugs and issues.

Assets 2

17 Jun 01:25

github-actions

v0.5.9

d80a990

v0.5.9

This main differences with the previous version are the following:

Added Wham/whamr/librimix for speech separation
Compatibility with PyTorch 1.9
Fixed minor bugs
Added SpeechBrain paper

Assets 2

06 Jun 01:42

github-actions

v0.5.8

09b973d

v0.5.8

SpeechBrain 0.5.8 improves the previous version in the following way:

Added wav2vec support in TIMIT, CommonVoice, AISHELL-1
Improved Fluent Speech Command Recipe
Improved SLU recipes
Recipe for UrbanSound8k
Fix small bugs
Fix typos

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 What's New in SpeechBrain 1.0?

📊 Some Numbers:

🌟 Key Updates:

🔄 Breaking Changes

🔊 Automatic Speech Recognition

🔄 Interface with Kaldi2 (K2-FSA)

🎙 Speech Synthesis (TTS)

🌐 Speech-to-Speech Translation:

🌟 Speech Generation

🎧 Interpretability of Audio Signals

😊 Speech Emotion Diarization

🎙️ Speaker Recognition

🔊 Speech Enhancement

🎵 Discrete Audio Representations

Contributors

Commits

SpeechBrain 0.5.15 Release Notes

What's New?

Notable Achievements

Commits

Notable achievements

What's Changed

New Contributors

Contributors

Commit summary

Release Notes - SpeechBrain v0.5.12

Releases: speechbrain/speechbrain

v1.0.0

🚀 What's New in SpeechBrain 1.0?

📊 Some Numbers:

🌟 Key Updates:

🔄 Breaking Changes

🔊 Automatic Speech Recognition

🔄 Interface with Kaldi2 (K2-FSA)

🎙 Speech Synthesis (TTS)

🌐 Speech-to-Speech Translation:

🌟 Speech Generation

🎧 Interpretability of Audio Signals

😊 Speech Emotion Diarization

🎙️ Speaker Recognition

🔊 Speech Enhancement

🎵 Discrete Audio Representations

Contributors

v0.5.16

Commits

v0.5.15

SpeechBrain 0.5.15 Release Notes

What's New?

Notable Achievements

Commits

SpeechBrain v0.5.14

Notable achievements

What's Changed

New Contributors

Contributors

SpeechBrain v0.5.13

Commit summary

SpeechBrain v0.5.12

Release Notes - SpeechBrain v0.5.12

SpeechBrain v0.5.11

SpeechBrain v0.5.10

v0.5.9

v0.5.8