Skip to content

A paper list of my history reading. Robotics, Learning, Vision.

Notifications You must be signed in to change notification settings

YanjieZe/Paper-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

A Paper List of Yanjie Ze

Topics:

Papers:

Recent Random Papers

  • RSS 2024, 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations, Website
  • RSS 2024, RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots, Website
  • CVPR 2024, OmniGlue: Generalizable Feature Matching with Foundation Model Guidance, Website
  • arXiv 2024.05, Pandora: Towards General World Model with Natural Language Actions and Video States, Website
  • arXiv 2024.05, Images that Sound: Composing Images and Sounds on a Single Canvas, Website
  • SIGGRAPH 2024, Text-to-Vector Generation with Neural Path Representation, Website
  • arXiv 2024.03, GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image, Website
  • arXiv 2024.05, Toon3D: Seeing Cartoons from a New Perspective, Website
  • arXiv 2024.05, TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction, Website
  • RSS 2024, Natural Language Can Help Bridge the Sim2Real Gap, arXiv
  • ICML 2024, The Platonic Representation Hypothesis, arXiv
  • arXiv 2024.05, SPIN: Simultaneous Perception, Interaction and Navigation, Website
  • RSS 2024, Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation, Website
  • arXiv 2024.05, Humanoid Parkour Learning, Website
  • arXiv 2024.05, Evaluating Real-World Robot Manipulation Policies in Simulation, Website
  • arXiv 2024.05, ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection, Website
  • arXiv 2024.04, DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets, arXiv
  • arXiv 2024.05, DrEureka: Language Model Guided Sim-To-Real Transfer, Website
  • arXiv 2024.05, Customizing Text-to-Image Models with a Single Image Pair, Website
  • arXiv 2024.05, SATO: Stable Text-to-Motion Framework, arXiv
  • ICRA 2024, Learning Force Control for Legged Manipulation, arXiv
  • arXiv 2024.05, IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning, arXiv
  • arXiv 2024.05, Track2Act: Predicting Point Tracks from Internet Videos enables Diverse Zero-shot Robot Manipulation, arXiv
  • arXiv 2024.04, KAN: Kolmogorov-Arnold Networks, arXiv
  • RSS 2023, IndustReal: Transferring Contact-Rich Assembly Tasks from Simulation to Reality, Website
  • arXiv 2024.04, Editable Image Elements for Controllable Synthesis, Website
  • arXiv 2024.04, EgoPet: Egomotion and Interaction Data from an Animal's Perspective, Website
  • SIIGRAPH 2023, OctFormer: Octree-based Transformers for 3D Point Clouds, Website
  • arXiv 2024.04, Clio: Real-time Task-Driven Open-Set 3D Scene Graphs, arXiv
  • ICCV 2023, Canonical Factors for Hybrid Neural Fields, Website
  • arXiv 2024.04, HATO: Learning Visuotactile Skills with Two Multifingered Hands, Website
  • arXiv 2024.04, SpringGrasp: Synthesizing Compliant Dexterous Grasps under Shape Uncertainty, Website
  • ICRA 2024 workshop, Object-Aware Gaussian Splatting for Robotic Manipulation, OpenReview
  • arXiv 2024.04, PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation, Website
  • arXiv 2015.09, MPPI: Model Predictive Path Integral Control using Covariance Variable Importance Sampling, arXiv
  • arXiv 2023.07, Sampling-based Model Predictive Control Leveraging Parallelizable Physics Simulations, arXiv / Github
  • arXiv 2024.04, BLINK: Multimodal Large Language Models Can See but Not Perceive, Website
  • arXiv 2024.04, Factorized Diffusion: Perceptual Illusions by Noise Decomposition, Website
  • CVPR 2024, Probing the 3D Awareness of Visual Foundation Models, arXiv
  • ICCV 2019, Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses, arXiv
  • arXiv 2024.04, QuasiSim: Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer, Website
  • arXiv 2024.04, Policy-Guided Diffusion, arXiv / Github
  • RoboSoft 2024, Body Design and Gait Generation of Chair-Type Asymmetrical Tripedal Low-rigidity Robot, Website
  • CVPR 2024 oral, MicKey: Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences, Website
  • arXiv 2024.04, ZeST: Zero-Shot Material Transfer from a Single Image, Website
  • arXiv 2024.03, Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics, Website
  • arXiv 2024.04, Reconstructing Hand-Held Objects in 3D, arXiv
  • ICRA 2024, Actor-Critic Model Predictive Control, arXiv
  • arXiv 2024.04, Finding Visual Task Vectors, arXiv
  • NeurIPS 2022, Visual Prompting via Image Inpainting, arXiv
  • CVPR 2024 highlight, SpatialTracker: Tracking Any 2D Pixels in 3D Space, Website
  • CVPR 2024, NeRF2Physics: Physical Property Understanding from Language-Embedded Feature Fields, Website
  • CVPR 2024, Scaling Laws of Synthetic Images for Model Training ... for Now, arXiv
  • CVPR 2024, A Vision Check-up for Language Models, arXiv
  • CVPR 2024, GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation, Website
  • arXiv 2024.04, PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments, Website
  • CVPR 2024, Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D, Website
  • arXiv 2024.03, LocoMan: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-Manipulators, arXiv
  • arXiv 2024.03, Leveraging Symmetry in RL-based Legged Locomotion Control, arXiv
  • arXiv 2024.03, RoboDuet: A Framework Affording Mobile-Manipulation and Cross-Embodiment, arXiv
  • arXiv 2024.03, Imitation Bootstrapped Reinforcement Learning, arXiv
  • arXiv 2024.03, Visual Whole-Body Control for Legged Loco-Manipulation, arXiv
  • arXiv 2024.03, S2: When Do We Not Need Larger Vision Models? arXiv
  • ICCV 2021, DPT: Vision Transformers for Dense Prediction, arXiv
  • arXiv 2024.03, GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation, Website
  • arXiv 2024.03, MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images, Website
  • arXiv 2024.03, LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors, Website
  • SIGGRAPH 2023, VET: Visual Error Tomography for Point Cloud Completion and High-Quality Neural Rendering, Github
  • arXiv 2024.03, On Pretraining Data Diversity for Self-Supervised Learning, arXiv
  • arXiv 2024.03, FeatUp: A Model-Agnostic Framework for Features at Any Resolution, arXiv / Github
  • arXiv 2024.03, Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers, arXiv
  • arXiv 2024.03, Yell At Your Robot: Improving On-the-Fly from Language Corrections, arXiv
  • arXiv 2024.03, DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset, Website
  • ICLR 2024 oral, Ghost on the Shell: An Expressive Representation of General 3D Shapes, Website
  • arXiv 2024.03, HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation, Website
  • arXiv 2024.03, PaperBot: Learning to Design Real-World Tools Using Paper, arXiv
  • arXiv 2024.03, GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping, arXiv
  • arXiv 2024.03, A Decade's Battle on Dataset Bias: Are We There Yet? arXiv
  • arXiv 2024.03, ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation, arXiv
  • arXiv 2024.03, Learning Generalizable Feature Fields for Mobile Manipulation, arXiv
  • arXiv 2024.03, DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation, arXiv
  • arXiv 2024.03, TeleMoMa: A Modular and Versatile Teleoperation System for Mobile Manipulation, arXiv
  • arXiv 2024.03, OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation, arXiv
  • CVPR 2020 oral, SuperGlue: Learning Feature Matching with Graph Neural Networks, Github
  • ICRA 2024, Learning to walk in confined spaces using 3D representation, arXiv
  • CVPR 2024, Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation, arXiv / Website
  • arXiv 2024.03, Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation, Website
  • ICRA 2024, Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning, arXiv
  • arXiv 2024.03, MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting, Website
  • arXiv 2024.03, VQ-BeT: Behavior Generation with Latent Actions, arXiv / Website
  • Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer, Website
  • arXiv 2024.03, Twisting Lids Off with Two Hands, Website
  • ICLR 2023 spotlight, Multi-skill Mobile Manipulation for Object Rearrangement, Github
  • CVPR 2024, Gaussian Splatting SLAM, Github
  • arXiv 2024.03, TripoSR: Fast 3D Object Reconstruction from a Single Image, Github
  • arXiv 2024.03, Point Could Mamba: Point Cloud Learning via State Space Model, arXiv
  • CVPR 2024, Rethinking Few-shot 3D Point Cloud Semantic Segmentation, arXiv
  • ICLR 2024, Can Transformers Capture Spatial Relations between Objects? arXiv / Website
  • SIGGRAPH Asia 2023, CamP: Camera Preconditioning for Neural Radiance Fields, Website / Github
  • arXiv 2024.02, Extreme Cross-Embodiment Learning for Manipulation and Navigation, Website
  • CVPR 2024, DUSt3R: Geometric 3D Vision Made Easy, Github
  • CVPR 2018 best paper, TASKONOMY: Disentangling Task Transfer Learning, Website
  • arXiv 2024.02, Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting, Website
  • CVPR 2024, Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features, Website
  • arXiv 2024.02, Disentangled 3D Scene Gen­eration with Layout Learning, Website
  • arXiv 2024.02, Transparent Image Layer Diffusion using Latent Transparency, Website
  • arXiv 2024.02, Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning, Website
  • arXiv 2024.02, Massive Activations in Large Language Models, Website
  • arXiv 2024.02, Dynamics-Guided Diffusion Model for Robot Manipulator Design, Website
  • arXiv 2024.02, Genie: Generative Interactive Environments, arXiv / Website
  • arXiv 2024.02, CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation, arXiv / Website
  • CoRL 2020, DSR: Learning 3D Dynamic Scene Representations for Robot Manipulation, Website
  • ICLR 2024 oral, Cameras as Rays: Pose Estimation via Ray Diffusion, Website
  • arXiv 2024.02, Pedipulate: Enabling Manipulation Skills using a Quadruped Robot's Leg, arXiv
  • arXiv 2024.02, LMPC: Learning to Learn Faster from Human Feedback with Language Model Predictive Control, Website
  • arXiv 2023.12, W.A.L.T: Photorealistic Video Generation with Diffusion Models, Website
  • arXiv 2024.02, Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots, Website
  • ICCV 2023 oral, DiT: Scalable Diffusion Models with Transformers, Website
  • arXiv 2023.07, Diffusion Models Beat GANs on Image Classification, arXiv
  • ICCV 2023 oral, DDAE: Denoising Diffusion Autoencoders are Unified Self-supervised Learners, arXiv
  • arXiv 2024.12, Mosaic-SDF for 3D Generative Models, arXiv / Website
  • arXiv 2024.02, POCO: Policy Composition From and For Heterogeneous Robot Learning, Website
  • ICML 2024 submission, Latent Graph Diffusion: A Unified Framework for Generation and Prediction on Graphs, arXiv
  • ICLR 2024 spotlight, AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents, arXiv
  • arXiv 2024.02, Offline Actor-Critic Reinforcement Learning Scales to Large Models, arXiv
  • arXiv 2024.02, V-IRL: Grounding Virtual Intelligence in Real Life, Website
  • ICRA 2024, SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning, Website
  • arXiv 2024.01, Generative Expressive Robot Behaviors using Large Language Models, arXiv
  • arXiv 2024.01, pix2gestalt: Amodal Segmentation by Synthesizing Wholes, Website
  • arXiv 2024.01, DAE: Deconstructing Denoising Diffusion Models for Self-Supervised Learning, arXiv
  • ICLR 2024, DittoGym: Learning to Control Soft Shape-Shifting Robots, Website
  • arXiv 2024.01, WildRGB-D: RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos, Website
  • arXiv 2024.01, Spatial VLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities, Website
  • arXiv 2024.01, Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training, arXiv
  • arXiv 2024.01, OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics, Website
  • L4DC 2023, Agile Catching with Whole-Body MPC and Blackbox Policy Learning, arXiv
  • arXiv 2024.01, Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, Github
  • arXiv 2024.01, WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens, Website
  • arXiv 2024.01, VMamba: Visual State Space Model, Github
  • arXiv 2024.01, DiffusionGPT: LLM-Driven Text-to-Image Generation System, arXiv /Website
  • arXiv 2023.12, PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction, Website
  • ICLR 2024 oral, UniSim: Learning Interactive Real-World Simulators, OpenReview
  • ICLR 2024 oral, ASID: Active Exploration for System Identification and Reconstruction in Robotic Manipulation, OpenReview
  • ICLR 2024 oral, Mastering Memory Tasks with World Models, OpenReview
  • ICLR 2024 oral, Predictive auxiliary objectives in deep RL mimic learning in the brain, OpenReview
  • ICLR 2024 oral, Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video, arXiv / OpenReview
  • arXiv 2024.01, URHand: Universal Relightable Hands, Website
  • arXiv 2023.12, Mamba: Linear-Time Sequence Modeling with Selective State Spaces, arXiv / Github
  • ICLR 2022, S4: Efficiently Modeling Long Sequences with Structured State Spaces, arXiv
  • arXiv 2024.01, Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning, arXiv
  • arXiv 2023.12, 3D-LFM: Lifting Foundation Model, Website
  • arXiv 2024.01, DVT: Denoising Vision Transformers, Website
  • arXiv 2024.01, Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively, Website / Code
  • arXiv 2024.01, ATM: Any-point Trajectory Modeling for Policy Learning, Website
  • CVPR 2024 submission, Learning Vision from Models Rivals Learning Vision from Data, arXiv / Github
  • CVPR 2024 submission, Visual Point Cloud Forecasting enables Scalable Autonomous Driving, arXiv / Github
  • CVPR 2024 submission, Ponymation: Learning 3D Animal Motions from Unlabeled Online Videos, arXiv / Website
  • CVPR 2024 submission, V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs, Website
  • NIPS 2021 outstanding paper, Deep Reinforcement Learning at the Edge of the Statistical Precipice, arXiv / Website
  • CVPR 2024 submission, Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model, Website
  • ICLR 2023, Deep Learning on 3D Neural Fields, arXiv
  • CVPR 2024 submission, Tracking Any Object Amodally, Website
  • CVPR 2024 submission, MobileSAMv2: Faster Segment Anything to Everything, Github
  • CVPR 2024 submission, AnyDoor: Zero-shot Object-level Image Customization, Github
  • CVPR 2024 submission, Point Transformer V3: Simpler, Faster, Stronger, arXiv / Github
  • CVPR 2024 submission, Alchemist: Parametric Control of Material Properties with Diffusion Models, Website
  • CVPR 2024 submission, Reconstructing Hands in 3D with Transformers, Website
  • CVPR 2024 submission, Language-Informed Visual Concept Learning, Website
  • CVPR 2024 submission, RCG: Self-conditioned Image Generation via Generating Representations, arXiv / Github
  • CVPR 2024 submission, Describing Differences in Image Sets with Natural Language, Website
  • CVPR 2024 submission, FaceStudio: Put Your Face Everywhere in Seconds, Website
  • CVPR 2024 submission, ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation, Website
  • CVPR 2024 submission, Fine-grained Controllable Video Generation via Object Appearance and Context, Website
  • CVPR 2024 submission, AmbiGen: Generating Ambigrams from Pre-trained Diffusion Model, Website
  • CVPR 2024 submission, ReconFusion: 3D Reconstruction with Diffusion Priors, Website
  • CVPR 2024 submission, Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives, arXiv / Website
  • CVPR 2024 submission, MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model, Github
  • CVPR 2024 submission, VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence, Website
  • CVPR 2024 submission, IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks, Website
  • CVPR 2024 submission, Generative Powers of Ten, Website
  • CVPR 2024 submission, DiffiT: Diffusion Vision Transformers for Image Generation, arXiv
  • CVPR 2024 submission, Learning from One Continuous Video Stream, arXiv
  • CVPR 2024 submission, EvE: Exploiting Generative Priors for Radiance Field Enrichment, Website
  • CVPR 2024 submission, Oryon: Open-Vocabulary Object 6D Pose Estimation, Website
  • CVPR 2024 submission, Dense Optical Tracking: Connecting the Dots, Website
  • CVPR 2024 submission, Sequential Modeling Enables Scalable Learning for Large Vision Models, Website
  • CVPR 2024 submission, VideoBooth: Diffusion-based Video Generation with Image Prompts, Website
  • CVPR 2024 submission, SODA: Bottleneck Diffusion Models for Representation Learning, Website
  • CVPR 2024 submission, Exploiting Diffusion Prior for Generalizable Pixel-Level Semantic Prediction, Website
  • arXiv 2023.11, Initializing Models with Larger Ones, arXiv
  • CVPR 2024 submission, Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Website / Github
  • CVPR 2023 best demo award, Diffusion Illusions: Hiding Images in Plain Sight, Website
  • CVPR 2024 submission, Do text-free diffusion models learn discriminative visual representations? Website
  • CVPR 2024 submission, Visual Anagrams: Synthesizing Multi-View Optical Illusions with Diffusion Models, Website
  • NIPS 2023, Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior, OpenReview
  • CoRL 2023 best paper, Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, Website
  • ICLR 2024 submission, RLIF: Interactive Imitation Learning as Reinforcement Learning, Website / arXiv
  • CVPR 2024 submission, PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF, arXiv
  • RSS 2018, Asymmetric Actor Critic for Image-Based Robot Learning, arXiv
  • ICLR 2022, RvS: What is Essential for Offline RL via Supervised Learning?, arXiv
  • NIPS 2021, Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser, arXiv
  • ICLR 2024 submission, Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning, OpenReview / arXiv
  • ICLR 2024 submission, Improved Techniques for Training Consistency Models, OpenReview / arXiv
  • ICLR 2024 submission, Privileged Sensing Scaffolds Reinforcement Learning, OpenReview
  • ICLR 2024 submission, SafeDiffuser: Safe Planning with Diffusion Probabilistic Models, arXiv / Website
  • NIPS 2023 workshop, Vision-Language Models Provide Promptable Representations for Reinforcement Learning, OpenReview
  • ICLR 2023 oral, Extreme Q-Learning: MaxEnt RL without Entropy, Website
  • ICLR 2024 submission, Generalization in diffusion models arises from geometry-adaptive harmonic representation, OpenReview
  • ICLR 2024 submission, DiffTOP: Differentiable Trajectory Optimization as a Policy Class for Reinforcement and Imitation Learning, OpenReview
  • CoRL 2023 best system paper, RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools, Website
  • CoRL 2023, Learning to Design and Use Tools for Robotic Manipulation, Website
  • arXiv 2023.10, Learning to (Learn at Test Time), arXiv / Github
  • CoRL 2023 workshop, FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning, OpenReview / Website
  • 2023.10, Non-parametric regression for robot learning on manifolds, arXiv
  • IROS 2021, Explaining the Decisions of Deep Policy Networks for Robotic Manipulations, arXiv
  • ICML 2022, The primacy bias in deep reinforcement learning, arXiv
  • ICML 2023 oral, The dormant neuron phenomenon in deep reinforcement learning, arXiv
  • arXiv 2022.04, Simplicial Embeddings in Self-Supervised Learning and Downstream Classification, arXiv
  • arXiv 2023.10, SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation, arXiv
  • arXiv 2023.10, SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding, arXiv
  • arXiv 2023.10, TD-MPC2: Scalable, Robust World Models for Continuous Control, arXiv / Github
  • arXiv 2023.10, EquivAct: SIM(3)-Equivariant Visuomotor Policies beyond Rigid Object Manipulation, Website
  • NeurIPS 2022, CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, arXiv / Github
  • arXiv 2023.10, Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning, Website
  • CoRL 2023, SAQ: Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning, Website
  • arXiv 2023.10, Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning, arXiv
  • arXiv 2023.03, PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining, arXiv
  • arXiv 2023.10, LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation, Website
  • arXiv 2023.10, 4K4D: Real-Time 4D View Synthesis at 4K Resolution, Website
  • arXiv 2023.10, SuSIE: Subgoal Synthesis via Image Editing, Website
  • arXiv 2023.10, Universal Visual Decomposer: Long-Horizon Manipulation Made Easy, Website
  • arXiv 2023.10, Learning to Act from Actionless Video through Dense Correspondences, Website
  • NIPS 2023, CEC: Cross-Episodic Curriculum for Transformer Agents, Website
  • ICLR 2024 submission, TD-MPC2: Scalable, Robust World Models for Continuous Control, Oepnreview
  • ICLR 2024 submission, 3D Diffuser Actor: Multi-task 3D Robot Manipulation with Iterative Error Feedback, Openreview
  • ICLR 2024 submission, NeRFuser: Diffusion Guided Multi-Task 3D Policy Learning, Openreview
  • arXiv 2023.10, Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance, arXiv
  • ICCV 2023, S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields, Website
  • arXiv 2023.09, Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning, Website / arXiv
  • ICCV 2023, End2End Multi-View Feature Matching with Differentiable Pose Optimization, Website
  • arXiv 2023.10, Aligning Text-to-Image Diffusion Models with Reward Backpropagation, Website / Github
  • NeurIPS 2023, EDP: Efficient Diffusion Policies for Offline Reinforcement Learning, arXiv / Github
  • arXiv 2023.09, See to Touch: Learning Tactile Dexterity through Visual Incentives, arXiv / Website
  • RSS 2023, SAM-RL: Sensing-Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering, arXiv / Website
  • arXiv 2023.09, MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation, arXiv / Website
  • arXiv 2023.09, DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation, Website / Github
  • arXiv 2023.09, D3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation, Website / Github
  • arXiv 2023.09, GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators, Website / arXiv
  • arXiv 2023.09, Human-Assisted Continual Robot Learning with Foundation Models, Website / arXiv
  • arXiv 2023.09, Robotic Offline RL from Internet Videos via Value-Function Pre-Training, arXiv / Website
  • ICCV 2023, PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking, arXiv / Github
  • arXiv 2023, Compositional Foundation Models for Hierarchical Planning, Website
  • RSS 2022 Best Student Paper Award Finalist, ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation, Website
  • CoRL 2023, REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation, arXiv / Website
  • CoRL 2023, An Unbiased Look at Datasets for Visuo-Motor Pre-Training, OpenReview
  • CoRL 2023, Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, OpenReview
  • ICCV 2023 oral, Tracking Everything Everywhere All at Once, Website
  • arXiv 2023.08, RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation, arXiv / Website
  • arXiv 2023.06, DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data, arXiv / Website
  • ICLR 2023 spotlight, FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation, Website
  • arXiv 2023.06, Seal: Segment Any Point Cloud Sequences by Distilling Vision Foundation Models, arXiv / Website / Github
  • arXiv 2023.08, BridgeData V2: A Dataset for Robot Learning at Scale, arXiv / Website
  • arXiv 2023.08, Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision, Website
  • ICML 2023, QRL: Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning, Website / Github
  • arXiv 2023.08, Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis, Website
  • SIGGRAPH 2023 best paper, 3D Gaussian Splatting for Real-Time Radiance Field Rendering, Website
  • CoRL 2022, In-Hand Object Rotation via Rapid Motor Adaptation, arXiv / Website
  • ICLR 2019, DPI-Net: Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids, Website
  • ICLR 2019, Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control, arXiv / Website
  • NeurIPS 2021 spotlight, NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction, Website
  • ICCV 2023, Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models, Website
  • AAAI 2018, FiLM: Visual Reasoning with a General Conditioning Layer, arXiv
  • arXiv 2023.08, RoboAgent: Towards Sample Efficient Robot Manipulation with Semantic Augmentations and Action Chunking, Website
  • ICRA 2000, RRT-Connect: An Efficient Approach to Single-Query Path Planning, PDF
  • CVPR 2017 oral, Network Dissection: Quantifying Interpretability of Deep Visual Representations, arXiv / Website
  • NIPS 2020 (spotlight), Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains, Website
  • ICRA 1992, Planning optimal grasps, PDF
  • RSS 2021, GIGA: Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations, arXiv / Website
  • ECCV 2022, StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning, arXiv / Github
  • ICML 2023, Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation, arXiv / Github
  • ECCV 2022, SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer, arXiv / Github
  • arXiv 2023.07, Waypoint-Based Imitation Learning for Robotic Manipulation, Website
  • ICML 2022, Prompt-DT: Prompting Decision Transformer for Few-Shot Policy Generalization, Website
  • arXiv 2023, Reinforcement Learning from Passive Data via Latent Intentions, Website
  • ICML 2023, RPG: Reparameterized Policy Learning for Multimodal Trajectory Optimization, Website
  • ICML 2023, TGRL: An Algorithm for Teacher Guided Reinforcement Learning, Website
  • arXiv 2023.07, XSkill: Cross Embodiment Skill Discovery, Website / arXiv
  • ICML 2023, Learning Neural Constitutive Laws: From Motion Observations for Generalizable PDE Dynamics, Website / Github
  • arXiv 2023.07, TokenFlow: Consistent Diffusion Features for Consistent Video Editing, Website
  • arXiv 2023.07, PAPR: Proximity Attention Point Rendering, Website / arXiv
  • ICCV 2023, DreamTeacher: Pretraining Image Backbones with Deep Generative Models, Website / arXiv
  • RSS 2023, Robust and Versatile Bipedal Jumping Control through Reinforcement Learning, arXiv
  • arXiv 2023.07, Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives, Website / arXiv
  • ICLR 2023, DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics, Website / Github
  • arXiv 2023.07, RPDiff: Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement, Website / Github
  • arXiv 2023.07, SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks, Website / Github
  • RSS 2023, DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training, Website / arXiv
  • arXiv 2023.07, KITE: Keypoint-Conditioned Policies for Semantic Manipulation, Website / arXiv
  • arXiv 2023.06, Detector-Free Structure from Motion, Website / arXiv
  • arXiv 2023.06, REFLECT: Summarizing Robot Experiences for FaiLure Explanation and CorrecTion, arXiv / Website
  • arXiv 2023.06, ViNT: A Foundation Model for Visual Navigation, Website
  • AAAI 2023, Improving Long-Horizon Imitation Through Instruction Prediction, arXiv / Github
  • arXiv 2023.06, RVT: Robotic View Transformer for 3D Object Manipulation, Website
  • arXiv 2023.01, Ponder: Point Cloud Pre-training via Neural Rendering, arXiv
  • arXiv 2023.06, SGR: A Universal Semantic-Geometric Representation for Robotic Manipulation, arXiv / Website
  • arXiv 2023.06, Robot Learning with Sensorimotor Pre-training, arXiv / Website
  • arXiv 2023.06, For SALE: State-Action Representation Learning for Deep Reinforcement Learning, arXiv / Github
  • arXiv 2023.06, HomeRobot: Open Vocabulary Mobile Manipulation, Website
  • arXiv 2023.06, Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Deep Pre-trained Models, Website
  • arXiv 2023.06, TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement, Website
  • CVPR 2017, I3D: Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, arXiv
  • arXiv 2023.06, Diffusion Models for Zero-Shot Open-Vocabulary Segmentation, Website
  • arXiv 2023.06, R-MAE: Regions Meet Masked Autoencoders, arXiv / Github
  • arXiv 2023.05, Optimus: Imitating Task and Motion Planning with Visuomotor Transformers, Website
  • arXiv 2023.05, Video Prediction Models as Rewards for Reinforcement Learning, arXiv / Website
  • ICML 2023, VIMA: General Robot Manipulation with Multimodal Prompts, Website / Github
  • arXiv 2023.05, SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning, arXiv
  • arXiv 2023.05, Training Diffusion Models with Reinforcement Learning, Website
  • arXiv 2023.03, Foundation Models for Decision Making: Problems, Methods, and Opportunities, arXiv
  • ICLR 2017, Third-Person Imitation Learning, arXiv
  • arXiv 2023.04, CoTPC: Chain-of-Thought Predictive Control, Website
  • CVPR 2023 highlight, ImageBind: One embedding to bind them all, Website / Github
  • arXiv 2023.05, Shap-E: Generating Conditional 3D Implicit Functions, Github
  • arXiv 2023.04, Track Anything: Segment Anything Meets Videos, Github
  • CVPR 2023, GLaD: Generalizing Dataset Distillation via Deep Generative Prior, Website
  • CVPR 2022 oral, RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs, Website
  • CVPR 2023, FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization, Website / Github
  • ICLR 2023 oral, Decision-Diffuser: Is Conditional Generative Modeling all you need for Decision-Making?, Website
  • CVPR 2022, Depth-supervised NeRF: Fewer Views and Faster Training for Free, Website
  • SIGGRAPH Asia 2022, ENeRF: Efficient Neural Radiance Fields for Interactive Free-viewpoint Video, Website
  • ICML 2023, On the power of foundation models, arXiv
  • ICML 2023, SNeRL: Semantic-aware Neural Radiance Fields for Reinforcement Learning, Website
  • ICLR 2023 outstanding paper, Emergence of Maps in the Memories of Blind Navigation Agents, Openreview
  • ICLR 2023 outstanding paper honorable mentions, Disentanglement with Biological Constraints: A Theory of Functional Cell Types, Openreview
  • CVPR 2023 award candidate, Data-driven Feature Tracking for Event Cameras, arXiv
  • CVPR 2023 award candidate, What Can Human Sketches Do for Object Detection?, Website
  • CVPR 2023 award candidate, Visual Programming for Compositional Visual Reasoning, Website
  • CVPR 2023 award candidate, On Distillation of Guided Diffusion Models, arXiv
  • CVPR 2023 award candidate, DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation, Website
  • CVPR 2023 award candidate, Planning-oriented Autonomous Driving, Github
  • CVPR 2023 award candidate, Neural Dynamic Image-Based Rendering, Website
  • CVPR 2023 award candidate, MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures, Website
  • CVPR 2023 award candidate, OmniObject3D: Large Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation, Website
  • CVPR 2023 award candidate, Ego-Body Pose Estimation via Ego-Head Pose Estimation, Website
  • CVPR 2023, Affordances from Human Videos as a Versatile Representation for Robotics, Website
  • CVPR 2022, Neural 3D Video Synthesis from Multi-view Video, Website
  • ICCV 2021, Nerfies: Deformable Neural Radiance Fields, Website / Github
  • CVPR 2023 highlight, HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling, Website / Github
  • arXiv 2022.05, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, arXiv / Github
  • CVPR 2023, CLIP^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data, arXiv
  • CVPR 2023, ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding, arXiv / Github
  • CVPR 2023, Learning Video Representations from Large Language Models, Website / Github
  • CVPR 2023, PLA: Language-Driven Open-Vocabulary 3D Scene Understanding, Website
  • CVPR 2023, PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models, arXiv
  • CVPR 2023, Mask-Free Video Instance Segmentation, Website / Github
  • arXiv 2023.04, DINOv2: Learning Robust Visual Features without Supervision, arXiv / Github
  • arXiv 2023.04, Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields, Website
  • arXiv 2023.04, SEEM: Segment Everything Everywhere All at Once, arXiv / code
  • arXiv 2023.04, Internet Explorer: Targeted Representation Learning on the Open Web, page / code
  • arXiv 2023.03, Consistency Models, code / arXiv
  • arXiv 2023.02, SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections, code / page
  • arXiv 2023.04, Generative Agents: Interactive Simulacra of Human Behavior, arXiv
  • ICLR 2023 notable, NTFields: Neural Time Fields for Physics-Informed Robot Motion Planning, OpenReview
  • arXiv 2023, For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal, arXiv
  • code, Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions, GitHub
  • arXiv 2023, Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, arXiv / GitHub
  • arXiv 2023, Zero-1-to-3: Zero-shot One Image to 3D Object, arXiv
  • ICLR 2023, Towards Stable Test-Time Adaptation in Dynamic Wild World, arXiv
  • CVPR 2023 highlight, Neural Volumetric Memory for Visual Locomotion Control, Website
  • arXiv 2023, Segment Anything, Website
  • ICRA 2023, DribbleBot: Dynamic Legged Manipulation in the Wild, Website
  • arXiv 2023, Alpaca: A Strong, Replicable Instruction-Following Model, Website
  • arXiv 2023, VC-1: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?, Website
  • ICLR 2022, DroQ: Dropout Q-Functions for Doubly Efficient Reinforcement Learning, arXiv
  • arXiv 2023, RoboPianist: A Benchmark for High-Dimensional Robot Control, Website
  • ICLR 2021, DDIM: Denoising Diffusion Implicit Models, arXiv
  • arXiv 2023, Your Diffusion Model is Secretly a Zero-Shot Classifier, Website
  • CVPR 2023 highlight, F2-NeRF: Fast Neural Radiance Field Training with Free Camera
  • arXiv 2023, Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware, Website
  • RSS 2021, RMA: Rapid Motor Adaptation for Legged Robots, Website
  • ICCV 2021, Where2Act: From Pixels to Actions for Articulated 3D Objects, Website
  • CVPR 2019 oral, Semantic Image Synthesis with Spatially-Adaptive Normalization, GitHub

Contact

If you have any questions or suggestions, please feel free to contact me at lastyanjieze@gmail.com .

About

A paper list of my history reading. Robotics, Learning, Vision.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published