Awesome-Multimodal-Chatbot

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience. It is designed to assist users in performing various tasks, from simple information retrieval to complex multimedia reasoning.

Multimodal Instruction Tuning

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

arXiv 2022/12 [paper]
GPT-4

arXiv 2023/03 [paper] [blog]
Visual Instruction Tuning

arXiv 2023/04 [paper] [code] [project page] [demo]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

arXiv 2023/04 [paper] [code] [project page] [demo]
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

arXiv 2023/04 [paper] [code] [demo]
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

arXiv 2023/04 [paper] [code] [demo]
Video-LLaMA: An Instruction-Finetuned Visual Language Model for Video Understanding

[code]
LMEye: An Interactive Perception Network for Large Language Models
arXiv 2023/05 [paper] [code]
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

arXiv 2023/05 [paper] [code] [demo]
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

arXiv 2023/05 [paper] [code] [project page]
Otter: A Multi-Modal Model with In-Context Instruction Tuning

arXiv 2023/05 [paper] [code] [demo]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

arXiv 2023/05 [paper] [code]
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

arXiv 2023/05 [paper] [code] [demo]
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

arXiv 2023/05 [paper] [code]
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
arXiv 2023/05 [paper] [code] [project page]
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

arXiv 2023/05 [paper] [code] [project page]
DetGPT: Detect What You Need via Reasoning

arXiv 2023/05 [paper] [code] [project page]
PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology

arXiv 2023/05 [paper] [code]
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

arXiv 2023/05 [paper] [code] [project page]
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

arXiv 2023/06 [paper] [code]
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

arXiv 2023/06 [paper]
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

arXiv 2023/06 [paper] [project page]
VALLEY: VIDEO ASSISTANT WITH LARGE LANGUAGE MODEL ENHANCED ABILITY

arXiv 2023/06 [paper] [code]

LLM-Based Modularized Frameworks

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

arXiv 2023/03 [paper] [code] [demo]
ViperGPT: Visual Inference via Python Execution for Reasoning

arXiv 2023/03 [paper] [code] [project page]
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

arXiv 2023/03 [paper] [code]
Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions

arXiv 2023/03 [paper] [code]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

arXiv 2023/03 [paper] [code] [project page] [demo]
Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface

arXiv 2023/03 [paper] [code] [demo]
VLog: Video as a Long Document

[code] [demo]
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

arXiv 2023/04 [paper] [code]
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

arXiv 2023/04 [paper] [project page]
VideoChat: Chat-Centric Video Understanding

arXiv 2023/05 [paper] [code] [demo]

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Awesome-Multimodal-Chatbot

Multimodal Instruction Tuning

LLM-Based Modularized Frameworks

About

Contributors 3

zjr2000/Awesome-Multimodal-Chatbot

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Awesome-Multimodal-Chatbot

Multimodal Instruction Tuning

LLM-Based Modularized Frameworks

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3