#

vision-language-model

Here are 123 public repositories matching this topic...

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification mme image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated Jun 11, 2024
Python

Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

mlx vision-framework apple-silicon vision-transformer llm vision-language-model llava local-ai idefics paligemma

Updated Jun 11, 2024
Python

richard-peng-xia / CARES

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

trustworthy-ai vision-language-model large-vision-language-model medical-multimodal-learning

Updated Jun 11, 2024

HenryPengZou / ImplicitAVE

[ACL 2024 Findings] Dataset and Code of "ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction"

attribute-value-extraction vision-language-model multimodal-llm implicit-attribute-value-extraction

Updated Jun 10, 2024
Jupyter Notebook

whwu95 / FreeVA

FreeVA: Offline MLLM as Training-Free Video Assistant

chatbot video-understanding zero-shot-video-captioning video-question-answering chatgpt vision-language-model llava training-free multimodal-large-language-models

Updated Jun 9, 2024
Python

llm-jp / awesome-japanese-llm

日本語LLMまとめ - Overview of Japanese LLMs

japanese generative-model japanese-language language-models language-model generative-models multimodal vision-and-language vision-language foundation-models large-language-models llm llms generative-ai large-language-model vision-language-model japanese-llm japanese-language-model llm-japanese

Updated Jun 8, 2024

MbodiAI / mbodied-agents

Seamlessly integrate state-of-the-art transformer models into robotics stacks

robotics artificial-intelligence transformer agents diffusion vlm multimodal large-language-models llm generative-ai vision-language-model

Updated Jun 11, 2024
Python

InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated Jun 7, 2024
Python

PJLab-ADG / awesome-knowledge-driven-AD

A curated list of awesome knowledge-driven autonomous driving (continually updated)

autonomous-driving knowledge-driven large-language-models vision-language-model

Updated Jun 7, 2024

FoundationVision / Groma

Grounded Multimodal Large Language Model with Localized Visual Tokenization

llama multimodal grounding foundation-models large-language-models llm mllm vision-language-model llama2

Updated Jun 7, 2024
Python

alexander-moore / vlm

Composition of Multimodal Language Models From Scratch

machine-learning ai vlm llm mllm vision-language-model multimodal-large-language-models mmllm

Updated Jun 6, 2024
Jupyter Notebook

ShareGPT4Omni / ShareGPT4V

An official implementation of ShareGPT4V: Improving Large Multi-modal Models with Better Captions

gpt language-model large-language-models chatgpt instruction-tuning vision-language-model large-vision-language-models gpt4v large-multimodal-models gpt-4v

Updated Jun 6, 2024
Python

zjysteven / VLM-Visualizer

Visualizing the attention of vision-language models

attention multi-modal attention-mechanism vision-language vision-language-model llava

Updated Jun 6, 2024
Jupyter Notebook

JinhaoLee / WCA

[ICML 2024] Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

deep-learning similarity-score image-text-matching zero-shot-classification large-language-models visual-prompting vision-language-model visual-text-alignment textual-prompting

Updated Jun 6, 2024
Python

RobustVLM

chs20 / RobustVLM

[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

ai ml clip adversarial-attacks adversarial-defense vision-language-model

Updated Jun 5, 2024
Python

zhengli97 / Awesome-Prompt-Learning-for-Vision-Language-Models

A curated list of prompt learning methods for vision-language models.

paper-list prompt-learning vision-language-model

Updated Jun 3, 2024

wjpoom / SPEC

[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"

language computer-vision vision clip image-retrieval fine-grained robustness text-retrieval multimodal compositionality vision-language vision-language-model cvpr2024

Updated Jun 3, 2024
Jupyter Notebook

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

vision-and-language lmm foundation-models vision-language-model llm-agent

Updated Jun 2, 2024
Python

billpsomas / rscir

Official PyTorch implementation and benchmark dataset for IGARSS 2024 ORAL paper: "Composed Image Retrieval for Remote Sensing"

computer-vision deep-learning satellite remote-sensing satellite-imagery earth-observation vision-language vision-transformer vision-language-model

Updated May 31, 2024
Python

PKU-YuanGroup / Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

video-understanding image-understanding large-language-models vision-language-model

Updated May 31, 2024
Python

Improve this page

Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."