Skip to content

You can upload images, ask questions about images using voice prompts, then listen to the responses in voice

Notifications You must be signed in to change notification settings

Ashot72/Answering-Questions-About-Images

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Answering Questions About Images

This is a Node.js app where you can upload images, ask questions about images using voice prompts, then listen to the responses in voice.

Voice to Text: I turn an audio into text using Whisper which is an OpenAI Speech Recognition Model that turns audio into text with up to 99% accuracy. Whisper is a speech transcription system form the creators of ChatGPT. Anyone can use it, and it is completely free. The system is trained on 680 000 hours of speech data from the network and recognizes 99 languages.

Generating Answers: We use blip-2 model that answers questions about images.

Text to Voice: I use gTTS.js which is Google Text to Speech JavaScript library originally written in Phyton.

To get started.

       Clone the repository

       git clone https://github.com/Ashot72/Answering-Questions-About-Images
       cd Answering-Questions-About-Images

       Add your key to .env file
       
       # installs dependencies
         npm install

       # to run locally
         npm start
      

Go to Answering Questions About Images Video page

Go to Answering Questions About Images Description page

About

You can upload images, ask questions about images using voice prompts, then listen to the responses in voice

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published