Zero-shot robotic manipulation with pretrained image-editing diffusion models

K Black, M Nakamoto, P Atreya, H Walke… - arXiv preprint arXiv …, 2023 - arxiv.org
If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …

Unleashing large-scale video generative pre-training for visual robot manipulation

H Wu, Y Jing, C Cheang, G Chen, J Xu, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative pre-trained models have demonstrated remarkable effectiveness in language
and vision domains by learning useful representations. In this paper, we extend the scope of …

Towards Generalist Robot Learning from Internet Video: A Survey

R McCarthy, DCH Tan, D Schmidt, F Acero… - arXiv preprint arXiv …, 2024 - arxiv.org
This survey presents an overview of methods for learning from video (LfV) in the context of
reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large …

Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

DITTO: Demonstration Imitation by Trajectory Transformation

N Heppert, M Argus, T Welschehold, T Brox… - arXiv preprint arXiv …, 2024 - arxiv.org
Teaching robots new skills quickly and conveniently is crucial for the broader adoption of
robotic systems. In this work, we address the problem of one-shot imitation from a single …

Diffusion imitation from observation

BR Huang, CK Yang, CM Lai, DJ Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning from observation (LfO) aims to imitate experts by learning from state-only
demonstrations without requiring action labels. Existing adversarial imitation learning …

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

H Luo, B Zhou, Z Lu - European Conference on Computer Vision, 2025 - Springer
Abstract Pre-training for Reinforcement Learning (RL) with purely video data is a valuable
yet challenging problem. Although in-the-wild videos are readily available and inhere a vast …

Learning an actionable discrete diffusion policy via large-scale actionless video pre-training

H He, C Bai, L Pan, W Zhang, B Zhao… - The Thirty-eighth Annual …, 2024 - openreview.net
Learning a generalist embodied agent capable of completing multiple tasks poses
challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In …

Decisionnce: Embodied multimodal representations via implicit preference learning

J Li, J Zheng, Y Zheng, L Mao, X Hu, S Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal pretraining has emerged as an effective strategy for the trinity of goals of
representation learning in autonomous robots: 1) extracting both local and global task …

Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

D Kim, H Lee, K Lee, D Hwang, J Choo - arXiv preprint arXiv:2406.06037, 2024 - arxiv.org
Recently, various pre-training methods have been introduced in vision-based
Reinforcement Learning (RL). However, their generalization ability remains unclear due to …