multimodal

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Jun 1, 2024
Python

rustic-ai / ui-components

Star

React component library for crafting user-friendly and engaging conversational experiences

chat ai reactjs mui reactjs-components conversational-ai multimodal

Updated Jun 1, 2024
TypeScript

Yangyi-Chen / Multimodal-AND-Large-Language-Models

Star

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

machine-learning multimodal large-language-models general-purpose-model

Updated May 31, 2024

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated May 31, 2024
Rust

bowen-upenn / scene_graph_commonsense

Star

This is the official implementation of the paper "Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge" in PyTorch.

machine-learning computer-vision deep-learning scene-graph commonsense-reasoning visual-genome multimodal scene-understanding scene-graph-generation large-language-models visual-relation-reasoning

Updated May 31, 2024
Jupyter Notebook

lab-rasool / MINDS

Star

🧠 | Multimodal Integration of Oncology Data System

data machine-learning deep-learning cancer nih oncology multimodal gdc-portal

Updated May 31, 2024
JavaScript

kyegomez / swarms

Sponsor

Star

Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD

Updated May 31, 2024
Python

web-arena-x / visualwebarena

Star

VisualWebArena is a benchmark for multimodal agents.

agents multimodal llm

Updated May 31, 2024
Python

MbodiAI / mbodied-agents

Star

Seamlessly integrate state-of-the-art transformer models into robotics stacks

robotics artificial-intelligence transformer agents diffusion vlm multimodal large-language-models llm generative-ai vision-language-model

Updated May 31, 2024
Python

songqiang321 / Awesome-AI-Papers

Star

This repository is used to collect papers and code in the field of AI.

Updated May 31, 2024

MMMU-Benchmark / MMMU

Star

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated May 31, 2024
Python

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

ai openai gpt multimodal gpt-3 prompt-engineering stable-diffusion

Updated May 31, 2024
HTML

Yuan-ManX / ai-multimodal-timeline

Star

Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Audio, Image, Video, Music and 3D content. 🔥

ai multi-modal deeplearning-ai multimodal multimodal-deep-learning llm

Updated May 31, 2024

2toinf / IVM

Star

The offical Implementation of "Instruction-Guided Visual Masking"

computer-vision deep-learning robotics multimodal pytorch-implementation instruction-following large-language-models instruction-tuning large-multimodal-models

Updated May 31, 2024
Jupyter Notebook

InternLM / HuixiangDou

Star

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

application ocr robot pipeline dsl chatbot wechat assistance lark multimodal rag llm

Updated May 31, 2024
Python

darmangerd / vubot

Star

Multimodal Computer Vision application leveraging object detections, gesture recognition and speech to text, in order to help user ask questions about their environment.

computer-vision speech-recognition object-detection gesture-recognition multimodal multimodal-deep-learning

Updated May 31, 2024
Python

dusty-nv / NanoLLM

Star

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference

Updated May 31, 2024
Python

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal

Here are 674 public repositories matching this topic...

modelscope / swift

Aisuko / notebooks

isLinXu / paper-list

NVIDIA / NeMo

rustic-ai / ui-components

Yangyi-Chen / Multimodal-AND-Large-Language-Models

rerun-io / rerun

bowen-upenn / scene_graph_commonsense

lab-rasool / MINDS

kyegomez / swarms

web-arena-x / visualwebarena

MbodiAI / mbodied-agents

songqiang321 / Awesome-AI-Papers

MMMU-Benchmark / MMMU

swyxio / ai-notes

Yuan-ManX / ai-multimodal-timeline

2toinf / IVM

InternLM / HuixiangDou

darmangerd / vubot

dusty-nv / NanoLLM

Improve this page

Add this topic to your repo