[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
-
Updated
Jun 11, 2024 - Python
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
[ACL 2024 Findings] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"
FreeVA: Offline MLLM as Training-Free Video Assistant
日本語LLMまとめ - Overview of Japanese LLMs
Seamlessly integrate state-of-the-art transformer models into robotics stacks
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
A curated list of awesome knowledge-driven autonomous driving (continually updated)
Grounded Multimodal Large Language Model with Localized Visual Tokenization
Composition of Multimodal Language Models From Scratch
An official implementation of ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Visualizing the attention of vision-language models
[ICML 2024] Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
A curated list of prompt learning methods for vision-language models.
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Official PyTorch implementation and benchmark dataset for IGARSS 2024 ORAL paper: "Composed Image Retrieval for Remote Sensing"
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."