Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

Diffusion world model

Z Ding, A Zhang, Y Tian, Q Zheng - arXiv preprint arXiv:2402.03570, 2024 - arxiv.org
We introduce Diffusion World Model (DWM), a conditional diffusion model capable of
predicting multistep future states and rewards concurrently. As opposed to traditional one …

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

H Zhu, Y Wang, D Huang, W Ye, W Ouyang… - arXiv preprint arXiv …, 2024 - arxiv.org
In this study, we explore the influence of different observation spaces on robot learning,
focusing on three predominant modalities: RGB, RGB-D, and point cloud. Through extensive …

Easyhec: Accurate and automatic hand-eye calibration via differentiable rendering and space exploration

L Chen, Y Qin, X Zhou, H Su - IEEE Robotics and Automation …, 2023 - ieeexplore.ieee.org
Hand-eye calibration is a critical task in robotics, as it directly affects the efficacy of critical
operations such as manipulation and grasping. Traditional methods for achieving this …

A Survey on Integration of Large Language Models with Intelligent Robots

Y Kim, D Kim, J Choi, J Park, N Oh, D Park - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, the integration of large language models (LLMs) has revolutionized the field
of robotics, enabling robots to communicate, understand, and reason with human-like …

Efficient Planning with Latent Diffusion

W Li - arXiv preprint arXiv:2310.00311, 2023 - arxiv.org
Temporal abstraction and efficient planning pose significant challenges in offline
reinforcement learning, mainly when dealing with domains that involve temporally extended …

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

J Zhang, C Bai, H He, W Xia, Z Wang, B Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of
scene understanding and action prediction. Current methods employ both 3D representation …

Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation

R Feng, D Hu, W Ma, X Li - arXiv preprint arXiv:2408.01366, 2024 - arxiv.org
Humans possess a remarkable talent for flexibly alternating to different senses when
interacting with the environment. Picture a chef skillfully gauging the timing of ingredient …

MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery

P Zhou, Y Yang - arXiv preprint arXiv:2407.15086, 2024 - arxiv.org
We aim to discover manipulation concepts embedded in the unannotated demonstrations,
which are recognized as key physical states. The discovered concepts can facilitate training …