Learning to manipulate anywhere: A visual generalizable framework for reinforcement learning

Z Yuan, T Wei, S Cheng, G Zhang, Y Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Can we endow visuomotor robots with generalization capabilities to operate in diverse open-
world scenarios? In this paper, we propose\textbf {Maniwhere}, a generalizable framework …

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Y Zhou, J Gu, TY Chiang, F Xiang, H Su - arXiv preprint arXiv:2406.17741, 2024 - arxiv.org
The development of 2D foundation models for image segmentation has been significantly
advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D …

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

J Zhang, C Bai, H He, W Xia, Z Wang, B Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of
scene understanding and action prediction. Current methods employ both 3D representation …

Policy-shaped prediction: avoiding distractions in model-based reinforcement learning

M Hutson, I Kauvar, N Haber - arXiv preprint arXiv:2412.05766, 2024 - arxiv.org
Model-based reinforcement learning (MBRL) is a promising route to sample-efficient policy
optimization. However, a known vulnerability of reconstruction-based MBRL consists of …

Focus On What Matters: Separated Models For Visual-Based RL Generalization

D Zhang, B Lv, H Zhang, F Yang, J Zhao, H Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
A primary challenge for visual-based Reinforcement Learning (RL) is to generalize
effectively across unseen environments. Although previous studies have explored different …