multimodal
Here are 674 public repositories matching this topic...
Implementation for the different ML tasks on Kaggle platform with GPUs.
-
Updated
Jun 1, 2024 - Jupyter Notebook
autoupdate paper list
-
Updated
Jun 1, 2024 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
Jun 1, 2024 - Python
React component library for crafting user-friendly and engaging conversational experiences
-
Updated
Jun 1, 2024 - TypeScript
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
-
Updated
May 31, 2024
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
-
Updated
May 31, 2024 - Rust
This is the official implementation of the paper "Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge" in PyTorch.
-
Updated
May 31, 2024 - Jupyter Notebook
🧠 | Multimodal Integration of Oncology Data System
-
Updated
May 31, 2024 - JavaScript
Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD
-
Updated
May 31, 2024 - Python
VisualWebArena is a benchmark for multimodal agents.
-
Updated
May 31, 2024 - Python
Seamlessly integrate state-of-the-art transformer models into robotics stacks
-
Updated
May 31, 2024 - Python
This repository is used to collect papers and code in the field of AI.
-
Updated
May 31, 2024
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
-
Updated
May 31, 2024 - Python
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
-
Updated
May 31, 2024 - HTML
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Audio, Image, Video, Music and 3D content. 🔥
-
Updated
May 31, 2024
The offical Implementation of "Instruction-Guided Visual Masking"
-
Updated
May 31, 2024 - Jupyter Notebook
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
-
Updated
May 31, 2024 - Python
Multimodal Computer Vision application leveraging object detections, gesture recognition and speech to text, in order to help user ask questions about their environment.
-
Updated
May 31, 2024 - Python
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
-
Updated
May 31, 2024 - Python
Improve this page
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."