Skip to content

ilovehackathons/kitt-chatbot

 
 

Repository files navigation

K.I.T.T. Chatbot

Did some prompt engineering during the K.I.T.T. hackathon at The Drivery in Berlin on 01.07.2023.

API providers

macOS quick start

  • brew update && brew upgrade && brew install portaudio # portaudio is required
  • python --version # Python 3.11.4
  • pip install -r requirements.txt
  • cp config.json.example config.json # Put your API credentials (OpenAI, AWS, Picovoice) there. Switch from 'AirPods Pro' to 'MacBook Air - Microphone' if necessary. You may also need to adjust the silence threshold (depends on your mic and environment).
  • python main.py # If it crashes, just press Ctrl-C and restart.
  • Say 'picovoice' (you can update the wake word in main.py).
  • Ask K.I.T.T. something, e.g. 'Take me to Las Vegas.'. The response will be played to the speakers and written to the console.
  • Repeat the above two steps as often as you want.
  • You can update the prompt in chat_gpt_service.py.

K.I.T.T.

A photo of K.I.T.T.

Original README

Link to the demo video

Introduction

This project is a voice-activated Raspberry Pi based system that listens for a wake word, "picovoice". Upon hearing the wake word, the system starts recording audio input until it detects silence. It then sends this audio input to OpenAI for transcription and further processing. The response from OpenAI is then converted to speech using Amazon Polly, a text-to-speech service, and played back to the user.

Project Structure

The project is divided into four main python files:

  1. main.py: The main script that integrates all other modules, listens for the wake word, manages the audio recording, and handles the interaction with OpenAI and AWS Polly.

  2. tts_service.py: Handles the text-to-speech conversion using Amazon Polly.

  3. input_listener.py: Handles the audio recording and silence detection.

  4. chat_gpt_service.py: Manages the interaction with OpenAI's GPT-3 model.

There is also a configuration file, config.json, which stores important parameters and keys.

Prerequisites

This project requires Python 3.8 or higher. It also requires specific Python packages which are listed in the requirements.txt file. Install the required packages with the command:

pip install -r requirements.txt

In addition to these packages, this project also uses:

  • Porcupine (Picovoice's wake word engine): This is used to listen for the wake word.
  • OpenAI: This is used to transcribe and process the audio input. You will need an API key from OpenAI to use this service.
  • Amazon Polly: This is used to convert the response from OpenAI into speech. You will need AWS credentials to use this service.

Configuration

All the keys and important parameters are stored in the config.json file. This includes:

  • OpenAI API key (openai_key): Used for interacting with OpenAI.
  • Porcupine Access Key (pv_access_key): Used for wake word detection.
  • AWS credentials (aws_access_key_id, aws_secret_access_key): Used for text-to-speech conversion with Amazon Polly.
  • Silence threshold (silence_threshold): The RMS threshold below which the input is considered silent.
  • Silence duration (silence_duration): The duration of silence (in seconds) after which the recording is stopped.
  • Sound card name (sound_card_name): The name of the sound card used for audio input.

Running the Project

To run the project, execute the main.py script:

python main.py

Obtaining Required Keys

This project requires keys from OpenAI, Picovoice, and AWS. Here is how to obtain them:

  1. OpenAI Key: Sign up at openai.com and follow the instructions here to obtain your secret API key.

  2. Picovoice Key: Sign up for free at picovoice.ai to obtain your key.

  3. AWS Keys: Sign up at AWS. You need to create an IAM user and obtain the access key and secret. Make sure to assign a policy that allows the user to use the Polly service.

After obtaining these keys, add them to your config.json file.

Silence Detection

The InputListener class is responsible for listening to the user's input and detecting silence. It uses the Root Mean Square (RMS) value of the audio signal to decide whether the user is speaking or not.

The threshold for silence can be adjusted in the config.json file using the silence_threshold parameter. The silence_duration parameter determines how long the silence must continue before the system decides that the user has finished speaking.

The correct values for these parameters may depend on the specific microphone and environment you are using. If you are unsure about the correct values, you can run the program and observe the RMS values that are printed to the console after the wake word is detected. Here is an example:

RMS: 347
RMS: 452
RMS: 458
RMS: 575
RMS: 392
RMS: 444
RMS: 474
RMS: 552
RMS: 304
RMS: 535
RMS: 456
RMS: 417
RMS: 226
RMS: 516
RMS: 523
RMS: 219
RMS: 296
RMS: 508
RMS: 375
RMS: 229
RMS: 439

By observing these values, you can get a sense of which RMS values correspond to speech and which correspond to silence, and adjust the silence_threshold and silence_duration parameters accordingly.

Tested Environment and Installation Instructions

This project has been tested on a RaspberryPi 4 using Raspberry Pi OS 64 bit (version 6.1, released on May 3rd, 2023). The SHA of the release is e7c0c89db32d457298fbe93195e9d11e3e6b4eb9e0683a7beb1598ea39a0a7aa.

We used the ReSpeaker 2-Mics Pi HAT as the sound card. More information about this sound card can be found here.

Raspberry Pi Setup

Below are the commands to set up the project on your Raspberry Pi:

  1. Install the ReSpeaker 2-Mics Pi HAT sound card
sudo apt-get update -y
sudo apt-get install portaudio19-dev libatlas-base-dev -y
git clone https://github.com/seeed-studio-projects/seeed-voicecard.git
cd seeed-voicecard
sudo ./install.sh
sudo reboot now
  1. Clone the VoiceBotChatGPT-RaspberryPI repository
git clone https://github.com/TamerinTECH/VoiceBotChatGPT-RaspberryPI.git
cd VoiceBotChatGPT-RaspberryPI
  1. Upgrade pip and install the required Python packages
pip install --upgrade pip setuptools wheel
pip3 install -r requirements.txt

Now, your Raspberry Pi is set up to run the project. Remember to add your API keys to the config.json file before running the main.py script.

Setting Up the Default Audio Output Device

In some instances, you may need to manually select the default audio output device. Here is how you can do it:

  1. Open the Raspberry Pi configuration settings:
sudo raspi-config
  1. Navigate through the menu options as follows:
1. System options  
S2. Audio
  1. Select the desired audio output option. In our case, it was the bcm2835-i2s... option corresponding to the ReSpeaker sound card.

After you've selected the appropriate option, the system should use this device as the default for audio output.

Limitations

Please note that this project was developed for hackathon and demo purposes. Therefore, there is no guarantee for its performance or functionality. For additional information, please contact the company TamerinTECH - voicebot@tamerin.tech

Acknowledgements

This documentation was written by ChatGPT with some supervision by the author