A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
-
Updated
Feb 6, 2022 - Python
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
A Comparative Framework for Multimodal Recommender Systems
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Automated modeling and machine learning framework FEDOT
A knowledge base construction engine for richly formatted data
Attention-based multimodal fusion for sentiment analysis
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Sequence-to-Sequence Framework in PyTorch
Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”
Towards Generalist Biomedical AI
A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
DANCE: a deep learning library and benchmark platform for single-cell analysis
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.
To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."