ReplanVLM: Replanning robotic tasks with visual language models

A Mei, GN Zhu, H Zhang, Z Gan - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
Large language models (LLMs) have gained increasing popularity in robotic task planning
due to their exceptional abilities in text analytics and generation, as well as their broad …

Closed-loop open-vocabulary mobile manipulation with gpt-4v

P Zhi, Z Zhang, M Han, Z Zhang, Z Li, Z Jiao… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous robot navigation and manipulation in open environments require reasoning
and replanning with closed-loop feedback. We present COME-robot, the first closed-loop …

ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization

K Darvish, M Skreta, Y Zhao, N Yoshikawa… - arXiv preprint arXiv …, 2024 - arxiv.org
Chemistry experimentation is often resource-and labor-intensive. Despite the many benefits
incurred by the integration of advanced and special-purpose lab equipment, many aspects …

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

J Duan, W Pumacay, N Kumar, YR Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Robotic manipulation in open-world settings requires not only task execution but also the
ability to detect and learn from failures. While recent advances in vision-language models …

VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions

G Chen, M Wang, YMT Cui, H Lu, T Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Visual imitation learning (VIL) provides an efficient and intuitive strategy for robotic systems
to acquire novel skills. Recent advancements in Vision Language Models (VLMs) have …

Guiding Long-Horizon Task and Motion Planning with Vision Language Models

Z Yang, C Garrett, D Fox, T Lozano-Pérez… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language Models (VLM) can generate plausible high-level plans when prompted
with a goal, the context, an image of the scene, and any planning constraints. However …

Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM

K Suzuki, T Ogata - … on Intelligent Robots and Systems (IROS), 2024 - ieeexplore.ieee.org
In recent years, studies have been actively conducted on combining large language models
(LLM) and robotics; however, most have not considered end-to-end feed-back in the robot …

GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games

A Mei, J Wang, GN Zhu, Z Gan - arXiv preprint arXiv:2405.13751, 2024 - arxiv.org
With their prominent scene understanding and reasoning capabilities, pre-trained visual-
language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task …

SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants

M Moghani, L Doorenbos, WCH Panitch… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we present SuFIA, the first framework for natural language-guided augmented
dexterity for robotic surgical assistants. SuFIA incorporates the strong reasoning capabilities …

Creative Problem Solving in Large Language and Vision Models--What Would it Take?

L Nair, E Gizzi, J Sinapov - arXiv preprint arXiv:2405.01453, 2024 - arxiv.org
In this paper, we discuss approaches for integrating Computational Creativity (CC) with
research in large language and vision models (LLVMs) to address a key limitation of these …