Perceiver-actor: A multi-task transformer for robotic manipulation

M Shridhar, L Manuelli, D Fox - Conference on Robot …, 2023 - proceedings.mlr.press
Transformers have revolutionized vision and natural language processing with their ability to
scale with large datasets. But in robotic manipulation, data is both limited and expensive …

Instruction-driven history-aware policies for robotic manipulations

PL Guhur, S Chen, RG Pinel… - … on Robot Learning, 2023 - proceedings.mlr.press
In human environments, robots are expected to accomplish a variety of manipulation tasks
given simple natural language instructions. Yet, robotic manipulation is extremely …

Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks

O Mees, L Hermann, E Rosete-Beas… - IEEE Robotics and …, 2022 - ieeexplore.ieee.org
General-purpose robots coexisting with humans in their environment must learn to relate
human language to their perceptions and actions to be useful in a range of daily tasks …

What matters in language conditioned robotic imitation learning over unstructured data

O Mees, L Hermann, W Burgard - IEEE Robotics and …, 2022 - ieeexplore.ieee.org
A long-standing goal in robotics is to build robots that can perform a wide range of daily
tasks from perceptions obtained with their onboard sensors and specified only via natural …

Grounding language with visual affordances over unstructured data

O Mees, J Borja-Diaz, W Burgard - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Recent works have shown that Large Language Models (LLMs) can be applied to ground
natural language to a wide variety of robot skills. However, in practice, learning multi-task …

Concept2robot: Learning manipulation concepts from instructions and human demonstrations

L Shao, T Migimatsu, Q Zhang… - … Journal of Robotics …, 2021 - journals.sagepub.com
We aim to endow a robot with the ability to learn manipulation concepts that link natural
language instructions to motor skills. Our goal is to learn a single multi-task policy that takes …

Artificial intelligence foundation and pre-trained models: Fundamentals, applications, opportunities, and social impacts

A Kolides, A Nawaz, A Rathor, D Beeman… - … Modelling Practice and …, 2023 - Elsevier
With the emergence of foundation models (FMs) that are trained on large amounts of data at
scale and adaptable to a wide range of downstream applications, AI is experiencing a …

Graspgpt: Leveraging semantic knowledge from a large language model for task-oriented grasping

C Tang, D Huang, W Ge, W Liu… - IEEE Robotics and …, 2023 - ieeexplore.ieee.org
Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that
enable subsequent manipulation tasks. To model the complex relationships between …

Task-oriented grasp prediction with visual-language inputs

C Tang, D Huang, L Meng, W Liu… - 2023 IEEE/RSJ …, 2023 - ieeexplore.ieee.org
To perform household tasks, assistive robots receive commands in the form of user
language instructions for tool manipulation. The initial stage involves selecting the intended …

A joint network for grasp detection conditioned on natural language commands

Y Chen, R Xu, Y Lin, PA Vela - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
We consider the task of grasping a target object based on a natural language command
query. Previous work primarily focused on localizing the object given the query, which …