Zero-shot robotic manipulation with pretrained image-editing diffusion models
If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …
recognize and reason about novel objects and scenarios. Such objects and scenarios might …
Unleashing large-scale video generative pre-training for visual robot manipulation
Generative pre-trained models have demonstrated remarkable effectiveness in language
and vision domains by learning useful representations. In this paper, we extend the scope of …
and vision domains by learning useful representations. In this paper, we extend the scope of …
Towards Generalist Robot Learning from Internet Video: A Survey
This survey presents an overview of methods for learning from video (LfV) in the context of
reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large …
reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large …
Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …
datasets, exhibit powerful capabilities in understanding complex patterns and generating …
DITTO: Demonstration Imitation by Trajectory Transformation
Teaching robots new skills quickly and conveniently is crucial for the broader adoption of
robotic systems. In this work, we address the problem of one-shot imitation from a single …
robotic systems. In this work, we address the problem of one-shot imitation from a single …
Diffusion imitation from observation
Learning from observation (LfO) aims to imitate experts by learning from state-only
demonstrations without requiring action labels. Existing adversarial imitation learning …
demonstrations without requiring action labels. Existing adversarial imitation learning …
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Abstract Pre-training for Reinforcement Learning (RL) with purely video data is a valuable
yet challenging problem. Although in-the-wild videos are readily available and inhere a vast …
yet challenging problem. Although in-the-wild videos are readily available and inhere a vast …
Learning an actionable discrete diffusion policy via large-scale actionless video pre-training
Learning a generalist embodied agent capable of completing multiple tasks poses
challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In …
challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In …
Decisionnce: Embodied multimodal representations via implicit preference learning
Multimodal pretraining has emerged as an effective strategy for the trinity of goals of
representation learning in autonomous robots: 1) extracting both local and global task …
representation learning in autonomous robots: 1) extracting both local and global task …
Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
Recently, various pre-training methods have been introduced in vision-based
Reinforcement Learning (RL). However, their generalization ability remains unclear due to …
Reinforcement Learning (RL). However, their generalization ability remains unclear due to …