Generative Image as Action Models

M Shridhar, YL Lo, S James - arXiv preprint arXiv:2407.07875, 2024 - arxiv.org
Image-generation diffusion models have been fine-tuned to unlock new capabilities such as
image-editing and novel view synthesis. Can we similarly unlock image-generation models …

Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

E Teoh, S Patidar, X Ma, S James - arXiv preprint arXiv:2407.07868, 2024 - arxiv.org
Generalising vision-based manipulation policies to novel environments remains a
challenging area with limited exploration. Current practices involve collecting data in one …

Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation

H Li, Q Feng, Z Zheng, J Feng, A Knoll - arXiv preprint arXiv:2407.00451, 2024 - arxiv.org
Learning from demonstrations faces challenges in generalizing beyond the training data
and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp …

BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

N Chernyadev, N Backshall, X Ma, Y Lu, Y Seo… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce BiGym, a new benchmark and learning environment for mobile bi-manual
demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home …

Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching

E Chisari, N Heppert, M Argus, T Welschehold… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning from expert demonstrations is a promising approach for training robotic
manipulation policies from limited data. However, imitation learning algorithms require a …

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

Y Wang, Y Zhang, M Huo, R Tian, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The increasing complexity of tasks in robotics demands efficient strategies for multitask and
continual learning. Traditional models typically rely on a universal policy for all tasks, facing …

Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models

H Cheng, E Xiao, C Yu, Z Yao, J Cao, Q Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, driven by advancements in Multimodal Large Language Models (MLLMs), Vision
Language Action Models (VLAMs) are being proposed to achieve better performance in …

Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

J Sun, A Curtis, Y You, Y Xu, M Koehle… - arXiv preprint arXiv …, 2024 - arxiv.org
Generalizable long-horizon robotic assembly requires reasoning at multiple levels of
abstraction. End-to-end imitation learning (IL) has been proven a promising approach, but it …

Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations

J Urain, A Mandlekar, Y Du, M Shafiullah, D Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning from Demonstrations, the field that proposes to learn robot behavior models from
data, is gaining popularity with the emergence of deep generative models. Although the …

Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning

V Vosylius, Y Seo, J Uruç, S James - arXiv preprint arXiv:2405.18196, 2024 - arxiv.org
In the field of Robot Learning, the complex mapping between high-dimensional
observations such as RGB images and low-level robotic actions, two inherently very different …