Skip to content

Jupyter notebook for turning textual dialogue into voice audio.

Notifications You must be signed in to change notification settings

olaviinha/NeuralDialogueAudiolizer

Repository files navigation

Neural Dialogue Audiolizer

Open In Colab

Neural Dialogue Audiolizer is a ".txt to .wav converter" that turns textual dialogue (e.g. an interview, a chat) between two individuals to audio dialogue with two freely selectable voices, currently by using any of the following APIs:

It was made to run in Google Colaboratory (i.e. your browser), using your Google Drive as data source and storage.

Audio demos

Source text Google Cloud TTS Amazon Polly TTS Microsoft Azure TTS
gpt-3_chat-1.txt WAV (loser) WAV WAV (winner)

API access

Access with necessary access keys is required to use any of the provided TTS APIs. More information on obtaining access:

Note that neural voices are available only in specific regions in all of these services. Select location accordingly when enabling the service/API where necessary.

Note that costs may apply. At the time of writing this, to the best of my knowledge, account creation to all of these services as well as limited monthly usage of these TTS APIs is free of charge, even if billing/credit card information is already required upon registration. You should also be aware that each line in each text file you audiolize, consumes one TTS API call. TODO: consume only 2 API calls and slice+merge returned audio files in Colab.

Input text

Input should be path to a .txt file located in your Google Drive, containing the dialogue in one of the following formats, with no other text. If your input material is a copy-paste from the interwebs, make sure to clean it up first to strictly follow one of these formats.

  1. question_and_answer expects an empty line between every time speaker changes. See example
  2. dialogue_with_names expects Name: (e.g. John: Hello Bob! How are you?) every time speaker changes. Speaker is changed despite the name in the beginning, i.e. if there are two consecutive lines beginning with John:, the notebook will still interpret the second as Bob, and your result is messed up. This will be improved in the distant future, perhaps. See example

Languages

This notebook has only English and Finnish voices by default. To add other languages, add the correct language names to p1_voice and p2_voice menus from Google Cloud TTS voice list, Amazon Polly TTS voice list or Microsoft Azure TTS voice list


Run NeuralDialogueAudiolizer.ipynb