Any-point trajectory modeling for policy learning

P Wu, Y Shentu, Z Yi, X Lin… - 2024 IEEE/RSJ …, 2024 - ieeexplore.ieee.org

Humans can teleoperate robots to accomplish complex manipulation tasks. Imitation
learning has emerged as a powerful framework that leverages human teleoperated …

被引用次数：60 相关文章所有 4 个版本

[PDF] arxiv.org

Rekep: Spatio-temporal reasoning of relational keypoint constraints for robotic manipulation

W Huang, C Wang, Y Li, R Zhang, L Fei-Fei - arXiv preprint arXiv …, 2024 - arxiv.org

Representing robotic manipulation tasks as constraints that associate the robot and the
environment is a promising way to encode desired robot behaviors. However, it remains …

被引用次数：33 相关文章所有 4 个版本

[PDF] thecvf.com

Bootstap: Bootstrapped training for tracking-any-point

C Doersch, P Luc, Y Yang, D Gokay… - Proceedings of the …, 2024 - openaccess.thecvf.com

To endow models with greater understanding of physics and motion, it is useful to enable
them to perceive how solid surfaces move and deform in real scenes. This can be formalized …

被引用次数：16 相关文章所有 2 个版本

[PDF] github.io

[PDF][PDF] Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

H Bharadhwaj, R Mottaghi, A Gupta… - arXiv preprint arXiv …, 2024 - homangab.github.io

We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot
manipulation—interacting with unseen objects in novel scenes without test-time adaptation …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Flow as the cross-domain manipulation interface

M Xu, Z Xu, Y Xu, C Chi, G Wetzstein, M Veloso… - arXiv preprint arXiv …, 2024 - arxiv.org

We present Im2Flow2Act, a scalable learning framework that enables robots to acquire real-
world manipulation skills without the need of real-world robot training data. The key idea …

被引用次数：11 相关文章所有 5 个版本

[PDF] openreview.net

Track2act: Predicting point tracks from internet videos enables generalizable robot manipulation

H Bharadhwaj, R Mottaghi, A Gupta… - European Conference on …, 2025 - Springer

We seek to learn a generalizable goal-conditioned policy that enables diverse robot
manipulation—interacting with unseen objects in novel scenes without test-time adaptation …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Gen2act: Human video generation in novel scenarios enables generalizable robot manipulation

H Bharadhwaj, D Dwibedi, A Gupta, S Tulsiani… - arXiv preprint arXiv …, 2024 - arxiv.org

How can robot manipulation policies generalize to novel tasks involving unseen object types
and new motions? In this paper, we provide a solution in terms of predicting motion …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Latent action pretraining from videos

S Ye, J Jang, B Jeon, S Joo, J Yang, B Peng… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised
method for pretraining Vision-Language-Action (VLA) models without ground-truth robot …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Robot see robot do: Imitating articulated object manipulation with monocular 4d reconstruction

J Kerr, CM Kim, M Wu, B Yi, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Humans can learn to manipulate new objects by simply watching others; providing robots
with the ability to learn from such demonstrations would enable a natural interface specifying …

被引用次数：4 相关文章所有 3 个版本

[PDF] openreview.net

Position: video as the new language for real-world decision making

S Yang, JC Walker, J Parker-Holder, Y Du… - … on Machine Learning, 2024 - openreview.net

Both text and video data are abundant on the internet and support large-scale self-
supervised learning through next token or frame prediction. However, they have not been …

被引用次数：1 相关文章