README 🎁

About

This repo is for text-to-audio diffusion utilizing a denoising unet and Meta's Encodec. The unet is trained to denoise Encodec's encoded codebooks while taking in t5 text embeddings as conditioning. Encodec's decoder can then take the denoised codebooks, and decode it to the uncompressed .wav file.

The architecture is by no means perfect as it is being actively tested/worked on. If you have any suggestions for improvements to try please don't hesistate to let us know!

Instructions

Clone the repo
Set up your environment
Launch the train_latent_cond.py file with accelerate (example_launch_command.txt in root directory for an example)
training_args.md in root directory for argument explanations
Inferencing scripts/notebooks/trained models coming soon

Shout Outs

Thanks to Hugging Face for diffusers/transformers and being a huge contribution to the open source community
Thanks to HarmonAI for their audio diffusion research and contributions to the open source community
Thanks to Stable Diffusion and OpenAI for the unet/cross-attention base code and for their open source contributions
Thanks to Meta for open sourcing Encodec and all of their other open source contributions
Thanks to Google for open sourcing the t5 large language model.
Shoutout to EveryDream for windows venv setup and bnb patch

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
audio_diffusion		audio_diffusion
dataset		dataset
utils		utils
viz		viz
.gitignore		.gitignore
FUNDING.yml		FUNDING.yml
LICENSE		LICENSE
README.md		README.md
activate_venv.bat		activate_venv.bat
defaults.ini		defaults.ini
example_launch_command.txt		example_launch_command.txt
make_audio_chunks.ipynb		make_audio_chunks.ipynb
setup.py		setup.py
train_latent_cond.py		train_latent_cond.py
training_args.md		training_args.md
windows_setup.cmd		windows_setup.cmd

License

serp-ai/ai-text-to-audio-latent-diffusion

Folders and files

Latest commit

History

Repository files navigation

README 🎁

About

Instructions

Shout Outs

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages