Skip to content

Releases: speechbrain/speechbrain

v1.0.0

26 Feb 20:54
Compare
Choose a tag to compare

GitHub Repo stars Please, help our community project. Star on GitHub!

🚀 What's New in SpeechBrain 1.0?

📅 On February 2024, we released SpeechBrain 1.0, the result of a year-long collaborative effort by a large international network of developers led by our exceptional core development team.

📊 Some Numbers:

  • SpeechBrain has evolved into a significant project and stands among the most widely used open-source toolkits for speech processing.
  • Over 140 developers have contributed to our repository, getting more than 7.3k stars on GitHub.
  • Monthly downloads from PyPI have reached an impressive 200k.
  • Expanded to over 200 recipes for Conversational AI, featuring more than 100 pretrained models on HuggingFace.

🌟 Key Updates:

  • SpeechBrain 1.0 introduces significant advancements, expanding support for diverse datasets and tasks, including NLP and EEG processing.

  • The toolkit now excels in Conversational AI and various sequence processing applications.

  • Improvements encompass key techniques in speech recognition, streamable conformer transducers, integration with K2 for Finite State Transducers, CTC decoding and n-gram rescoring, new CTC/joint attention Beam Search interface, enhanced compatibility with HuggingFace Models (including GPT2 and Llama2), and refined data augmentation, training, and inference processes.

  • We have created a new repository dedicated to benchmarks, accessible at here. At present, this repository features benchmarks for various domains, including speech self-supervised models (MP3S), continual learning (CL-MASR), and EEG processing (SpeechBrain-MOABB).

For detailed technical information, please refer to the section below.

🔄 Breaking Changes

People familiar with SpeechBrain know very well that we do our best to avoid backward incompatible changes. While SpeechBrain has consistently prioritized maintaining backward compatibility, the introduction of this new major version presented an opportunity for significant enhancements and refactorings.

  1. 🤗 HuggingFace Interface Refactor:

    • Previously, our interfaces were limited to specific models like Whisper, HuBERT, WavLM, and wav2vec 2.0.
    • We've refactored the interface to be more general, now supporting any transformer model from HuggingFace including LLMs.
    • Simply inherit from our new interface and enjoy the flexibility.
    • The updated interfaces can be accessed here.
  2. 🔍 BeamSearch Refactor:

    • The previous beam search interface, while functional, was challenging to comprehend and modify due to the combined search and rescoring parts.
    • We've introduced a new interface where scoring and search are separated, managed by distinct functions, resulting in simpler and more readable code.
    • This update allows users to easily incorporate various scorers, including n-gram LM and custom heuristics, in the search part.
    • Additionally, support for pure CTC training and decoding, batch and GPU decoding, partial or full candidate scoring, and N-best hypothesis output with neural LM rescorers has been added.
    • An interface to K2 for search based on Finite State Transducers (FST) is now available.
    • The updated decoders are available here.
  3. 🎨 Data Augmentation Refactor:

    • The data augmentation capabilities have been enhanced, offering users access to various functions in speechbrain/augment.
    • New techniques, such as CodecAugment, RandomShift (Time), RandomShift (Frequency), DoClip, RandAmp, ChannelDrop, ChannelSwap, CutCat, and DropBitResolution, have been introduced.
    • Augmentation can now be customized and combined using the Augmenter interface in speechbrain/augment/augmenter.py, providing more control during training.
    • Take a look here for a tutorial on speech augmentation.
    • The updated augmenters are available here.
  4. 🧠 Brain Class Refactor:

    • The fit_batch method in the Brain Class has been refactored to minimize the need for overrides in training scripts.
    • Native support for different precisions (fp32, fp16, bf16), mixed precision, compilation, multiple optimizers, and improved multi-GPU training with torchrun is now available.
    • Take a look at the refactored brain class here.
  5. 🔍 Inference Interfaces Refactor:

    • Inference interfaces, once stored in a single file (speechbrain/pretrained/interfaces.py), are now organized into smaller libraries in speechbrain/inference, enhancing clarity and intuitiveness.
    • You can access the new inference interfaces here.

🔊 Automatic Speech Recognition

  • Developed a new recipe for training a Streamable Conformer Transducer using Librispeech dataset (accessible here). The streamable model achieves a Word Error Rate (WER) of 2.72% on the test-clean subset.
  • Implemented a dedicated inference inference to support streamable ASR (accessible here).
  • New models, including HyperConformer andd Branchformer have been introduced. Examples of recipes utilizing them can be found here.
  • Additional support for datasets like RescueSpeech, CommonVoice 14.0, AMI, Tedlium 2.
  • The ASR search pipeline has undergone a complete refactoring and enhancement (see comment above).
  • A new recipe for Bayesian ASR has been added here.

🔄 Interface with Kaldi2 (K2-FSA)

  • Integration of an interface that seamlessly connects SpeechBrain with K2-FSA, allowing for constrained search and more.
  • Support for K2 CTC training and lexicon decoding, along with integration of K2 HLG and n-gram rescoring.
  • Competitive results achieved with Wav2vec2 on LibriSpeech test sets.
  • Explore an example recipe utilizing K2 here.

🎙 Speech Synthesis (TTS)

🌐 Speech-to-Speech Translation:

  • Introduction of new recipes for CVSS datasets and IWSLT 2022 Low-resource Task, based on mBART/NLLB and SAMU wav2vec.

🌟 Speech Generation

  • Implementation of diffusion and latent diffusion techniques with an example recipe showcased on AudioMNIST.

🎧 Interpretability of Audio Signals

  • Implementation of Learning to Interpret and PIQ techniques with example recipes demonstrated on ECS50.

😊 Speech Emotion Diarization

  • Support for Speech Emotion Diarization, featuring an example recipe on the Zaion Emotion Dataset. See the training recipe here.

🎙️ Speaker Recognition

🔊 Speech Enhancement

  • Release of a new Speech Enhancement baseline based on the DNS dataset.

🎵 Discrete Audio Representations

  • Support for pretrained models with discrete audio representations, including EnCodec and DAC.
  • Support for discretization of continuous represen...
Read more

v0.5.16

22 Nov 02:28
Compare
Choose a tag to compare

SpeechBrain 0.5.16 will be the last minor version of SpeechBrain before the major release of SpeechBrain 1.0.

In this minor version, we have focused on refining the existing features without introducing any interface changes, ensuring a seamless transition to SpeechBrain 1.0 where backward incompatible modifications will take place.

Key Highlights of SpeechBrain 0.5.16:

Bug Fixes: Numerous small fixes have been implemented to enhance the overall stability and performance of SpeechBrain.

Testing and Documentation: We have dedicated efforts to improve our testing infrastructure and documentation, ensuring a more robust and user-friendly experience.

Expanded Model and Dataset Support: SpeechBrain 0.5.16 introduces support for several new models and datasets, enhancing the versatility of the platform. For a detailed list, please refer to the commits below.

Stay informed and get ready for the groundbreaking SpeechBrain 1.0, where we will unveil substantial changes and exciting new features.

Thank you for being a part of the SpeechBrain community!

Commits

  • [cea36b4]: Update README.md (Mirco Ravanelli) #1599
  • [cead130]: Updated README.md (prometheus) #975
  • [779c620]: Update README.md (Mirco Ravanelli) #2124
  • [32af2ac]: update requirement (to avoid deprecation error) (Mirco Ravanelli) #975
  • [b039df1]: small fixes (Mirco Ravanelli) #975
  • [07e7c73]: small fixes (Mirco Ravanelli) #975
  • [dac6842]: Update README.md (Mirco Ravanelli) #975
  • [75f4c66]: Update README.md (Mirco Ravanelli) #975
  • [327a3f5]: Fixed SSVEP yaml file (prometheus) #975
  • [067d94e]: Fixed conflicts (prometheus) #975
  • [331741d]: Fixed read/write conflicts mne config file when training many models in parallel (prometheus) #975
  • [0f25d5b]: Added hparam files for other architectures (prometheus) #975
  • [9ba76e3]: Updated LMDA, forcing odd kernel size in depth attention (prometheus) #975
  • [6336200]: Fixed activation in LMDA (prometheus) #975
  • [1593cc4]: Fixed issue in deepconvnet (prometheus) #975
  • [2f0f5f0]: Fixed issue with shallowconvnet (prometheus) #975
  • [8f70136]: Fixed issue with lmda (prometheus) #975
  • [ac4f9e4]: Merge remote-tracking branch 'origin/develop' into fixeval (Adel Moumen) #2123
  • [cdce80c]: fix ddp issue with loading a key (Adel Moumen) #2128
  • [66633a0]: Added template yaml files (prometheus) #975
  • [6f631a7]: minor additions for tests (pradnya-git-dev) #2120
  • [331acdb]: add notes on tests with non-default gpu (Mirco Ravanelli) #2130
  • [091b3ce]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [cc72c9e]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [c60e606]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [253859e]: Resolve paths so relative works too (Aku Rouhe) #2128
  • [8a98401]: small fix on orion flag (Mirco Ravanelli) #975
  • [7da9a95]: extend fix to all files (Mirco Ravanelli) #975
  • [4b09ff2]: fix style (Mirco Ravanelli) #975
  • [ced2922]: Merge remote-tracking branch 'upstream/develop' into eeg_decoding (Mirco Ravanelli) #975
  • [5e070a2]: fix useless file (Mirco Ravanelli) #975
  • [46565cf]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into develop (xuechenliu) #2142
  • [19235f2]: Merge remote-tracking branch 'upstream/Adel-Moumen-revert_commit_ddp' into revert_commit_ddp (Adel Moumen) #2128
  • [2fb247f]: Save the checkpoint folder and meta only on the main process and communicate to all procs (Peter Plantinga) #2132
  • [f37d433]: Only broadcast checkpoint folder if distributed (Peter Plantinga) #2132
  • [e23da7d]: Initialize external loggers only on main process (Peter Plantinga) #2134
  • [67b1255]: fixes (BenoitWang) #2119
  • [70d8901]: Merge branch 'develop' into fs2_internal_alignment (Yingzhi WANG) #2119
  • [5565073]: Add file check on all recipe tests (#2126) (Mirco Ravanelli) #2126
  • [76923a4]: removeused varibles, add exception types (BenoitWang) #2119
  • [[0a18729](https://github.com/speechbrain/speechbrain/commit/0a187291fa95323929ac...
Read more

v0.5.15

22 Jul 18:07
Compare
Choose a tag to compare

SpeechBrain 0.5.15 Release Notes

We are thrilled to announce the release of SpeechBrain version 0.5.15! This new version represents a significant step forward for our open-source Conversational AI toolkit. The core team, along with a rapidly growing network of contributors, has worked diligently to enhance and expand the toolkit while addressing various issues.

What's New?

This release marks a crucial point as it will likely be the final minor version before the highly anticipated SpeechBrain 1.0, scheduled for release in the coming months. We have achieved notable milestones in this version, and a summary of the key achievements is presented below. For a comprehensive list of all changes, please refer to the detailed notes at the end.

Notable Achievements

  1. Benchmark Repository:
    We are proud to introduce the benchmark repository, which aims to provide a standard recipe for researchers to benchmark and compare different techniques and models. Currently, the following benchmarks are available:

    • CLMASR: Evaluates continual learning techniques for speech recognition in new languages.
    • MP3S Benchmarks: Assesses speech self-supervised representation across various tasks and with different downstream models (multi-probe).
  2. Enhanced User Experience:
    We've made it more convenient for our users to access logs and checkpoints by migrating the logs and output folders from Gdrive to Dropbox.

  3. New Models with Improved Performance:
    We implemented a modified Fastspeech 2.0. This offers efficiency and pretty high performance. We've made significant steps in enhancing performance on Librispeech, thanks to the implementation of better Conformers and Branchformers. Additionally, we've introduced a performant Conformer Transducer and the SLI-GRU model.

  4. Post-doc Interpretability Techniques:
    We now offer improved support for post-doc interpretability techniques. Refer to the ESC50 recipe for more information.

  5. New Datasets:
    We've incorporated recipes for new datasets, including the recently released RescueSpeech (speech recognition in rescue and domain environments) and the Zaion Emotion Dataset for Speech Emotion Recognition.

  6. Enhanced Korean ASR:
    We've made improvements to KsponSpeech for Korean Automatic Speech Recognition.

  7. Improved Recipe Tests:
    We've taken steps to enhance recipe tests, ensuring better reliability and performance.

  8. ** Whisper Fixes**:
    We've fixed Whisper recipes and interfaces in a way that maintains backward compatibility. This was necessary to address interface changes made in the original model.

  9. Various Fixes:
    In addition to the above achievements, we've addressed several other issues, including gradient accumulation and various minor fixes.

Thank you to our dedicated community of contributors and users for making this release possible! We invite you to explore the new features and improvements in SpeechBrain 0.5.15 and look forward to the upcoming release of SpeechBrain 1.0. Happy SpeechBrain-ing!

For a complete list of changes, please refer to the detailed release notes below.

Commits

Read more

SpeechBrain v0.5.14

24 Mar 17:40
97dbeb3
Compare
Choose a tag to compare

This release is a minor yet important release. It increases significantly the number of features available while fixing quite a lot of small bugs and issues. A summary of the achievements of this release is given below, while a complete detailed list of all the changes can be found at the bottom of this release note.

Notable achievements

  • 22 new contributors, thank you so much, everyone!
  • 31 new recipes (ASR, SLU, AST, AER, Interpretability, SSL).
  • FULL automatic recipe testing.
  • Increased coverage for the continuous integration over the code, URLs, YAML, recipes, and HuggingFace models.
  • New Conformer Large model for ASR.
  • Integration of Whisper for fine-tuning or inference.
  • Full pre-training of wav2vec2 entirely re-implemented AND documented.
  • Low resource Speech Translation with IWSLT.
  • Many other novelties... see below.

What's Changed

New Contributors

Read more

SpeechBrain v0.5.13

29 Aug 16:25
Compare
Choose a tag to compare

This is a minor release with better dependency version specification. We note that SpeechBrain is compatible with PyTorch 1.12, and the updated package reflects this. See the issue linked next to each commit for more details about the corresponding changes.

Commit summary

  • [edb7714]: Adding no_sync and on_fit_batch_end method to core (Rudolf Arseni Braun) #1449
  • [07155e9]: G2P fixes (flexthink) #1473
  • [6602dab]: fix for #1469, minimal testing for profiling (anautsch) #1476
  • [abbfab9]: test clean-ups: passes linters; doctests; unit & integration tests; load-yaml on cpu (anautsch) #1487
  • [1a16b41]: fix ddp incorrect command (=) #1498
  • [0b0ec9d]: using no_sync() in fit_batch() of core.py (Rudolf Arseni Braun) #1449
  • [5c9b833]: Remove torch maximum compatible version (Peter Plantinga) #1504
  • [d0f4352]: remove limit for HF hub as it does not work with colab (Titouan) #1508
  • [b78f6f8]: Add revision to hub (Titouan) #1510
  • [2c491a4]: fix transducer loss inputs devices (Adel Moumen) #1511
  • [4972f76]: missing space in install command (pehonnet) #1512
  • [6bc72af]: Fixing shuffle argument for distributed sampler in core.py (Rudolf Arseni Braun) #1518
  • [df7acd9]: Added the link for example results (cem) #1523
  • [5bae6df]: add LinearWarmupScheduler (Ge Li) #1537
  • [2edd7ee]: updating scipy version in requirements.txt. (Nauman Dawalatabad) #1546

SpeechBrain v0.5.12

26 Jun 20:19
fc15db4
Compare
Choose a tag to compare

Release Notes - SpeechBrain v0.5.12

We worked very hard and we are very happy to announce the new version of SpeechBrain!

SpeechBrain 0.5.12 significantly expands the toolkit without introducing any major interface changes. I would like to warmly thank the many contributors that made this possible.

The main changes are the following:

A) Text-to-Speech: We developed the first TTS system of SpeechBrain. You can find it here. The system relies on Tacotron2 + HiFiGAN (as vocoder). The models coupled with an easy-inference interface are available on HuggingFace.

B) Grapheme-to-Phoneme (G2P): We developed an advanced Grapheme-to-Phoneme. You can find the code here. The current version significantly outperforms our previous model.

C) Speech Separation:

  1. We developed a novel version of the SepFormer called Resource-Efficient SepFormer (RE-Sepformer). The code is available here and the pre-trained model (with an easy inference interface) here.
  2. We released a recipe for Binaural speech separation with WSJMix. See the code here.
  3. We released a new recipe with the AIShell mix dataset. You can see the code here.

D) Speech Enhancement:

  1. We released the SepFormer model for speech enhancement. the code is here, while the pre-trained model (with easy-inference interface) is here.
  2. We implemented the WideResNet for speech enhancement and use it to mimic loss-based speech enhancement. The code is here and the pretrained model (with easy-inference interface) is here.

E) Feature Front-ends:

  1. We now support LEAF filter banks. The code is here. You can find an example of a recipe using it here.
  2. We now support SincConv multichannel (see code here).

F) Recipe Refactors:

  1. We refactored the Voxceleb recipe and fix the normalization issues. See the new code here. We also made the EER computation method less memory demanding (see here).
  2. We refactored the IEMOCAP recipe for emotion recognition. See the new code here.

G) Models for African Languages:
We now have recipes for the DVoice dataset. We currently support Darija, Swahili, Wolof, Fongbe, and Amharic. The code is available here. The pretrained model (coupled with an easy-inference interface) can be found on SpeechBrain-HuggingFace.

H) Profiler:
We implemented a model profiler that helps users while developing new models with SpeechBrain. The profiler outputs a bunch of potentially useful information, such as the real-time factors and many other details.
A tutorial is available here.

I) Tests:
We significantly improved the tests. In particular, we introduced the following tests: HF_repo tests, docstring checks, yaml-script consistency, recipe tests, and check URLs. This will helps us scale up the project.

L) Other improvements:

  1. We now support the torchaudio RNNT loss*.
  2. We improved the relative attention mechanism of the Conformer.
  3. We updated the transformer for LibriSpeech. This improves the performance from WER= 2.46% to 2.26% on the test-clean. See the code here.
  4. The Environmental corruption module can now support different sampling rates.
  5. Minor fixes.

SpeechBrain v0.5.11

20 Dec 04:22
d6bfe13
Compare
Choose a tag to compare

Dear users,
We worked very hard, and we are very happy to announce the new version of SpeechBrain.
SpeechBrain 0.5.11 further expands the toolkit without introducing any major interface change.

The main changes are the following:

  1. We implemented new recipes, such as:
  1. Support for Dynamic batching with a Tutorial to help users familiarize themselves with it.

  2. Support for wav2vec training within SpeechBrain.

  3. Developed an interface with Orion for hyperparameter tuning with a Tutorial to help users familiarize themselves with it.

  4. the torchaudio transducer loss is now supported. We also kept our numba implementation to help users customize the transducer loss part if needed.

  5. Improved CTC-Segmentation

  6. Fixed minor bugs and issues (e.g., fixed MVDR beamformer ).

Let me thank all the amazing contributors for this achievement.
Please, keep add a star to our project if you appreciate our effort for the community.
Together, we are growing very fast, and we have big plans for the future.

Stay Tuned!

SpeechBrain v0.5.10

11 Sep 22:34
29b3b74
Compare
Choose a tag to compare

This version mainly expands the functionalities of SpeechBrain without adding any backward incompatibilities.

New Recipes:

  • Language Identification with CommonLanguage
  • EEG signal processing with ERPCore
  • Speech translation with Fisher-Call Home
  • Emotion Recognition with IEMOCAP
  • Voice Activity Detection with LibriParty
  • ASR with LibriSpeech wav2vec (WER=1.9 on test-clean)
  • SpeechEnhancement with CoopNet
  • SpeechEnhancement with SEGAN
  • Speech Separation with LibriMix, WHAM, and WHAMR
  • Support for guided attention
  • Spoken Language Understanding with SLURP

Beyond that, we fixed some minor bugs and issues.

v0.5.9

17 Jun 01:25
Compare
Choose a tag to compare

This main differences with the previous version are the following:

  • Added Wham/whamr/librimix for speech separation
  • Compatibility with PyTorch 1.9
  • Fixed minor bugs
  • Added SpeechBrain paper

v0.5.8

06 Jun 01:42
Compare
Choose a tag to compare

SpeechBrain 0.5.8 improves the previous version in the following way:

  • Added wav2vec support in TIMIT, CommonVoice, AISHELL-1
  • Improved Fluent Speech Command Recipe
  • Improved SLU recipes
  • Recipe for UrbanSound8k
  • Fix small bugs
  • Fix typos