Generative Image as Action Models
M Shridhar, YL Lo, S James - arXiv preprint arXiv:2407.07875, 2024 - arxiv.org
Image-generation diffusion models have been fine-tuned to unlock new capabilities such as
image-editing and novel view synthesis. Can we similarly unlock image-generation models …
image-editing and novel view synthesis. Can we similarly unlock image-generation models …
Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation
Generalising vision-based manipulation policies to novel environments remains a
challenging area with limited exploration. Current practices involve collecting data in one …
challenging area with limited exploration. Current practices involve collecting data in one …
Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation
Learning from demonstrations faces challenges in generalizing beyond the training data
and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp …
and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp …
BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark
We introduce BiGym, a new benchmark and learning environment for mobile bi-manual
demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home …
demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home …
Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching
Learning from expert demonstrations is a promising approach for training robotic
manipulation policies from limited data. However, imitation learning algorithms require a …
manipulation policies from limited data. However, imitation learning algorithms require a …
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning
The increasing complexity of tasks in robotics demands efficient strategies for multitask and
continual learning. Traditional models typically rely on a universal policy for all tasks, facing …
continual learning. Traditional models typically rely on a universal policy for all tasks, facing …
Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models
Recently, driven by advancements in Multimodal Large Language Models (MLLMs), Vision
Language Action Models (VLAMs) are being proposed to achieve better performance in …
Language Action Models (VLAMs) are being proposed to achieve better performance in …
Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly
Generalizable long-horizon robotic assembly requires reasoning at multiple levels of
abstraction. End-to-end imitation learning (IL) has been proven a promising approach, but it …
abstraction. End-to-end imitation learning (IL) has been proven a promising approach, but it …
Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations
Learning from Demonstrations, the field that proposes to learn robot behavior models from
data, is gaining popularity with the emergence of deep generative models. Although the …
data, is gaining popularity with the emergence of deep generative models. Although the …
Render and Diffuse: Aligning Image and Action Spaces for Diffusion-based Behaviour Cloning
In the field of Robot Learning, the complex mapping between high-dimensional
observations such as RGB images and low-level robotic actions, two inherently very different …
observations such as RGB images and low-level robotic actions, two inherently very different …