VisionScriptBot

A telegram bot that uses Google's Gemini Pro Vision API , Take a demo here. New Version support prompts along with Images, Add your prompt in Image caption before uploading the Image.

Gemini Vision Pro

Gemini Pro Vision is a Gemini large language vision model that understands input from text and visual modalities (image and video) in addition to text to generate relevant text responses.

Gemini Pro Vision is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.

Gemini API

VisionScriptBot uses Google new Gemini Pro Model .

Gemini is Google's latest family of large language models.

API KEY

You need Google Api key 🔐 for Gemini to run this model. Get your api key from https://makersuite.google.com/app/apikey

Google's Python SDK for the Gemini API, is contained in the google-generativeai package. Install the dependency using pip:

pip install -q -U google-generativeai

for complete guide refer

Deploy

Deployed on Railway.app , do checkout their free hosting plans here

Use cases

Visual information seeking: Use external knowledge combined with information extracted from the input image or video to answer questions.
Object recognition: Answer questions related to fine-grained identification of the objects in images and videos.
Digital content understanding: Answer questions and extract information from visual content like infographics, charts, figures, tables, and web pages.
Structured content generation: Generate responses based on multimodal inputs in formats like HTML and JSON.
Captioning and description: Generate descriptions of images and videos with varying levels of details.
Reasoning: Compositionally infer new information without memorization or retrieval.

Demo

Support

If You find this project useful, Do support me here

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
demos		demos
.env_example		.env_example
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.json		app.json
main.py		main.py
railway.json		railway.json
requirements.txt		requirements.txt
stickers.py		stickers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

demos

demos

.env_example

.env_example

.gitignore

.gitignore

LICENSE

LICENSE

Procfile

Procfile

README.md

README.md

app.json

app.json

main.py

main.py

railway.json

railway.json

requirements.txt

requirements.txt

stickers.py

stickers.py

Repository files navigation

VisionScriptBot

Gemini Vision Pro

Gemini API

API KEY

Deploy

Use cases

Demo

Support

About

Sponsor this project

Languages

License

nuhmanpk/VisionScriptBot

Folders and files

Latest commit

History

Repository files navigation

VisionScriptBot

Gemini Vision Pro

Gemini API

API KEY

Deploy

Use cases

Demo

Support

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages