Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators

P Wu, Y Shentu, Z Yi, X Lin… - 2024 IEEE/RSJ …, 2024 - ieeexplore.ieee.org
Humans can teleoperate robots to accomplish complex manipulation tasks. Imitation
learning has emerged as a powerful framework that leverages human teleoperated …

Rekep: Spatio-temporal reasoning of relational keypoint constraints for robotic manipulation

W Huang, C Wang, Y Li, R Zhang, L Fei-Fei - arXiv preprint arXiv …, 2024 - arxiv.org
Representing robotic manipulation tasks as constraints that associate the robot and the
environment is a promising way to encode desired robot behaviors. However, it remains …

Bootstap: Bootstrapped training for tracking-any-point

C Doersch, P Luc, Y Yang, D Gokay… - Proceedings of the …, 2024 - openaccess.thecvf.com
To endow models with greater understanding of physics and motion, it is useful to enable
them to perceive how solid surfaces move and deform in real scenes. This can be formalized …

[PDF][PDF] Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

H Bharadhwaj, R Mottaghi, A Gupta… - arXiv preprint arXiv …, 2024 - homangab.github.io
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot
manipulation—interacting with unseen objects in novel scenes without test-time adaptation …

Flow as the cross-domain manipulation interface

M Xu, Z Xu, Y Xu, C Chi, G Wetzstein, M Veloso… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Im2Flow2Act, a scalable learning framework that enables robots to acquire real-
world manipulation skills without the need of real-world robot training data. The key idea …

Track2act: Predicting point tracks from internet videos enables generalizable robot manipulation

H Bharadhwaj, R Mottaghi, A Gupta… - European Conference on …, 2025 - Springer
We seek to learn a generalizable goal-conditioned policy that enables diverse robot
manipulation—interacting with unseen objects in novel scenes without test-time adaptation …

Gen2act: Human video generation in novel scenarios enables generalizable robot manipulation

H Bharadhwaj, D Dwibedi, A Gupta, S Tulsiani… - arXiv preprint arXiv …, 2024 - arxiv.org
How can robot manipulation policies generalize to novel tasks involving unseen object types
and new motions? In this paper, we provide a solution in terms of predicting motion …

Latent action pretraining from videos

S Ye, J Jang, B Jeon, S Joo, J Yang, B Peng… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised
method for pretraining Vision-Language-Action (VLA) models without ground-truth robot …

Robot see robot do: Imitating articulated object manipulation with monocular 4d reconstruction

J Kerr, CM Kim, M Wu, B Yi, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Humans can learn to manipulate new objects by simply watching others; providing robots
with the ability to learn from such demonstrations would enable a natural interface specifying …

Position: video as the new language for real-world decision making

S Yang, JC Walker, J Parker-Holder, Y Du… - … on Machine Learning, 2024 - openreview.net
Both text and video data are abundant on the internet and support large-scale self-
supervised learning through next token or frame prediction. However, they have not been …