Robotic offline rl from internet videos via value-function pre-training

K Black, M Nakamoto, P Atreya, H Walke… - arXiv preprint arXiv …, 2023 - arxiv.org

If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …

被引用次数：83 相关文章所有 4 个版本

[PDF] arxiv.org

Unleashing large-scale video generative pre-training for visual robot manipulation

H Wu, Y Jing, C Cheang, G Chen, J Xu, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative pre-trained models have demonstrated remarkable effectiveness in language
and vision domains by learning useful representations. In this paper, we extend the scope of …

被引用次数：54 相关文章所有 3 个版本

[PDF] arxiv.org

Towards Generalist Robot Learning from Internet Video: A Survey

R McCarthy, DCH Tan, D Schmidt, F Acero… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents an overview of methods for learning from video (LfV) in the context of
reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arXiv preprint arXiv …, 2024 - arxiv.org

Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

DITTO: Demonstration Imitation by Trajectory Transformation

N Heppert, M Argus, T Welschehold, T Brox… - arXiv preprint arXiv …, 2024 - arxiv.org

Teaching robots new skills quickly and conveniently is crucial for the broader adoption of
robotic systems. In this work, we address the problem of one-shot imitation from a single …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Diffusion imitation from observation

BR Huang, CK Yang, CM Lai, DJ Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Learning from observation (LfO) aims to imitate experts by learning from state-only
demonstrations without requiring action labels. Existing adversarial imitation learning …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

H Luo, B Zhou, Z Lu - European Conference on Computer Vision, 2025 - Springer

Abstract Pre-training for Reinforcement Learning (RL) with purely video data is a valuable
yet challenging problem. Although in-the-wild videos are readily available and inhere a vast …

Learning an actionable discrete diffusion policy via large-scale actionless video pre-training

H He, C Bai, L Pan, W Zhang, B Zhao… - The Thirty-eighth Annual …, 2024 - openreview.net

Learning a generalist embodied agent capable of completing multiple tasks poses
challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In …

被引用次数：2 相关文章

[PDF] arxiv.org

Decisionnce: Embodied multimodal representations via implicit preference learning

J Li, J Zheng, Y Zheng, L Mao, X Hu, S Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal pretraining has emerged as an effective strategy for the trinity of goals of
representation learning in autonomous robots: 1) extracting both local and global task …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

D Kim, H Lee, K Lee, D Hwang, J Choo - arXiv preprint arXiv:2406.06037, 2024 - arxiv.org

Recently, various pre-training methods have been introduced in vision-based
Reinforcement Learning (RL). However, their generalization ability remains unclear due to …

被引用次数：2 相关文章所有 3 个版本