Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request:please add voice in/out #4

Open
develperbayman opened this issue Feb 1, 2023 · 11 comments
Open

feature request:please add voice in/out #4

develperbayman opened this issue Feb 1, 2023 · 11 comments
Labels
Enhancement ✨ New feature or request good first issue 🥇 Good for newcomers

Comments

@develperbayman
Copy link

title covers it would be sweet to talk and reply in audio

@RaSan147
Copy link
Owner

RaSan147 commented Feb 2, 2023

The web ui is under development, try this branch if you want voice
https://github.com/RaSan147/VoiceAI-Asuna/tree/kivy-gui

I'm trying hard to replicate cute voice, but online services require payment or api and OS based voice is kinda feels off. So trying to learn alternative ways now.

And as for the sweet talk part, i just jumped of from python based kivy to server based (self made) web ui. So there are tons of ground work needs to be done before adding more commands, so things getting a bit slow

Sorry

@RaSan147
Copy link
Owner

RaSan147 commented Feb 2, 2023

@develperbayman if you know any website that let users produce voice wav files via api (free) that would be a life saving help 🛐🙇‍♂️

@RaSan147 RaSan147 added the good first issue 🥇 Good for newcomers label Feb 3, 2023
@RaSan147
Copy link
Owner

RaSan147 commented Feb 3, 2023

@develperbayman could you check if there's any voice in https://www.voicerss.org/api/demo.aspx
you like (goes well with the character)??
I may be able to handle pitch and speed to mimic some expression

@RaSan147 RaSan147 added the Enhancement ✨ New feature or request label Feb 3, 2023
@develperbayman
Copy link
Author

develperbayman commented Jun 19, 2023

wow i totally did not realize you replied its prob a little late (my apologies i havent been very active for a bit) but perhaps you would be more interested in a tts engine and a stt engine to accomplish this i am using one for python for my AI script im trying to do take a peek

@develperbayman
Copy link
Author

develperbayman commented Jun 19, 2023

import` threading
import time
import sys
import chat_commands
from gtts import gTTS
import os
import tkinter as tk
from tkinter import filedialog, messagebox
import speech_recognition as sr
import webbrowser
import re
import subprocess
import openai

doListenToCommand = True
listening = False

List with common departures to end the while loop

despedida = ["Goodbye", "goodbye", "bye", "Bye", "See you later", "see you later"]

Create the GUI window

window = tk.Tk()
window.title("Computer: AI")
window.geometry("400x400")

Create the text entry box

text_entry = tk.Entry(window, width=50)
text_entry.pack(side=tk.BOTTOM)

Create the submit button

submit_button = tk.Button(window, text="Submit", command=lambda: submit())
submit_button.pack(side=tk.BOTTOM)

Create the text output box

text_output = tk.Text(window, height=300, width=300)
text_output.pack(side=tk.BOTTOM)

Set your OpenAI API key here

openai.api_key = "your_api_key_here"

def submit(event=None, text_input=None):
global doListenToCommand
global listening

# Get the user input and check if the input matches the list of goodbyes
if text_input is not None and text_input != "":
    usuario = text_input
else:
    usuario = text_entry.get()

if usuario in despedida:
    on_closing()
else:
    prompt = f"You are ChatGPT and answer my following message: {usuario}"

# Getting responses using the OpenAI API
response = openai.Completion.create(
    engine="text-davinci-003",
    prompt=prompt,
    max_tokens=2049
)

respuesta = response["choices"][0]["text"]

# Converting text to audio
texto = str(respuesta)
tts = gTTS(texto, lang='en', tld='ie')
tts.save("audio.mp3")

# Displaying the answer on the screen
text_output.insert(tk.END, "ChatGPT: " + respuesta + "\n")

# Clear the input text
text_entry.delete(0, tk.END)

# Playing the audio
doListenToCommand = False
time.sleep(1)
os.system("play audio.mp3")
doListenToCommand = True

# Call function to listen to the user
if not listening:
    listen_to_command()

Bind the Enter key to the submit function

window.bind("", submit)

def load_core_principles(file_path):
with open(file_path, 'r') as file:
principles = file.readlines()
return principles

def listen_to_command():
global doListenToCommand
global listening

# If we are not to be listening then exit the function.
if not doListenToCommand:
    return

# Initialize the recognizer
r = sr.Recognizer()

# Use the default microphone as the audio source
with sr.Microphone() as source:
    print("Listening...")
    listening = True
    audio = r.listen(source)
    listening = False

try:
    # Use speech recognition to convert speech to text
    command = r.recognize_google(audio)
    print("You said:", command)
    text_output.insert(tk.END, "You: " + command + "\n")
    text_entry.delete(0, tk.END)

    # Process the commands
    # Prepare object to be passed.
    class PassedCommands:
        tk = tk
        text_output = text_output
        submit = submit

    chat_commands.process_commands(PassedCommands, command)

except sr.UnknownValueError:
    print("Speech recognition could not understand audio.")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service:", str(e))

listening = False
listen_to_command()

def on_closing():
if messagebox.askokcancel("Quit", "Do you want to quit?"):
window.destroy()

window.protocol("WM_DELETE_WINDOW", on_closing)

if name == "main":
# Create the menu bar
menu_bar = tk.Menu(window)

# Create the "File" menu
file_menu = tk.Menu(menu_bar, tearoff=0)
file_menu.add_command(label="Open LLM", command=lambda: filedialog.askopenfilename())
file_menu.add_command(label="Save LLM", command=lambda: filedialog.asksaveasfilename())
file_menu.add_separator()
file_menu.add_command(label="Exit", command=on_closing)
menu_bar.add_cascade(label="File", menu=file_menu)

# Create the "Run" menu
run_menu = tk.Menu(menu_bar, tearoff=0)
run_menu.add_command(label="Run as normal app", command=lambda: threading.Thread(target=run_as_normal_app).start())
run_menu.add_command(label="Run on Flask", command=lambda: threading.Thread(target=run_on_flask).start())
menu_bar.add_cascade(label="Run", menu=run_menu)

# Set the menu bar
window.config(menu=menu_bar)

# Start the main program loop
start_listening_thread = threading.Thread(target=listen_to_command)
start_listening_thread.daemon = True
start_listening_thread.start()
window.mainloop() 

@develperbayman
Copy link
Author

develperbayman commented Jun 19, 2023

i hate markup it never works for me but yeah it generates the mp3 automatically this example uses openai
actually this script is complete however it uses another python script to supply any extra commands

@develperbayman
Copy link
Author

import subprocess
import webbrowser
import re
import validators
import sys

def process_commands(passed_commands, command):
if "computer" in command.lower():
print("Activated Command: Computer")
passed_commands.text_output.insert(
passed_commands.tk.END, "Activated Command: Computer" + "\n")
passed_commands.submit(text_input=command)
# listen_to_command()

    # Open a website
    #if command.lower().startswith("open website"):
    if "open website" in command.lower():
        # Extract the website URL from the command
        #url = command.replace("open website", "")
        url = command.partition("open website")
        # access third tuple element
        url = url[2]
        url = url.strip() # Strip whitespace on both ends. Not working? As there is a space in the leading part of the URL variable after this.
        # Test for http:// or https:// and add http:// to the URL if missing.
        if not url.startswith("http://") and not url.startswith("https://"):
            url = "http://" + url
        
        print("Trying to open website: " + url)

        # Validating if the URL is correct
        if validators.url(url):
            webbrowser.open(url, new=0, autoraise=True)
            
            passed_commands.text_output.insert(
                passed_commands.tk.END, "Opening website: " + url + "\n")
        else:
            print("Invalid URL command. URL: " + url)
            passed_commands.text_output.insert(
                passed_commands.tk.END, "Invalid URL command. URL: " + url + "\n")

    return

def process_commands(passed_commands, command):
if "computer" in command.lower():
print("Activated Command: Computer")
passed_commands.text_output.insert(
passed_commands.tk.END, "Activated Command: Computer" + "\n")
passed_commands.submit(text_input=command)
# listen_to_command()

    # Open an application
    if "run program" in command.lower():
        # Extract the application name from the command
        app_name = command.partition("run program")[2]
        app_name = app_name.strip()

        print("Trying to open program: " + app_name)

        try:
            subprocess.Popen(app_name)
            passed_commands.text_output.insert(
                passed_commands.tk.END, "Opening program: " + app_name + "\n")
        except FileNotFoundError:
            print("Program not found: " + app_name)
            passed_commands.text_output.insert(
                passed_commands.tk.END, "Program not found: " + app_name + "\n")

        return

    print("Invalid command")
    passed_commands.text_output.insert(
        passed_commands.tk.END, "Invalid command" + "\n")


# Testing
# Stop listening to the microphone
if command.lower() == "stop listening":
    passed_commands.text_output.insert(
        passed_commands.tk.END, "Stopping the microphone." + "\n")
    # What goes here?

    return

# Testing
# Allow program exit via voice.
if command.lower() == "stop program":
    passed_commands.text_output.insert(
        passed_commands.tk.END, "Stopping the program." + "\n")
    
    sys.exit()

    return

@develperbayman
Copy link
Author

again sorry for the very late reply but this should get you started please let me know if it helps or if you do anything cool with it

@develperbayman
Copy link
Author

next im working on a huggingface transformers version to self host your own model but dear god the hardware needed for that is insane

@RaSan147
Copy link
Owner

next im working on a huggingface transformers version to self host your own model but dear god the hardware needed for that is insane

thats why i dropped all the hopes of running AI just for TTS
I'll use edge_tts for speech output (half way done)
and for voice recog, this will run on client side, so your openAI solution is no help here. I'll use JS speech recog to voice2text. (need to start working)

EDGE_TTS has some real good collection of voice, thank you microsoft

@develperbayman
Copy link
Author

Maybe I'll switch to edge I'm very interested in better sounding voice output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ New feature or request good first issue 🥇 Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants