Task and motion planning with large language models for object rearrangement

Y Ding, X Zhang, C Paxton… - 2023 IEEE/RSJ …, 2023 - ieeexplore.ieee.org
Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning
is frequently needed in this process. However, achieving commonsense arrangements …

Skill transformer: A monolithic policy for mobile manipulation

X Huang, D Batra, A Rai, A Szot - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract We present Skill Transformer, an approach for solving long-horizon robotic tasks by
combining conditional sequence modeling and skill modularity. Conditioned on egocentric …

Large language models as generalizable policies for embodied tasks

A Szot, M Schwarzer, H Agrawal… - The Twelfth …, 2023 - openreview.net
We show that large language models (LLMs) can be adapted to be generalizable policies
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …

Galactic: Scaling end-to-end reinforcement learning for rearrangement at 100k steps-per-second

VP Berges, A Szot, DS Chaplot… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for
robotic mobile manipulation in indoor environments. Specifically, a Fetch robot (equipped …

Adaptive coordination in social embodied rearrangement

A Szot, U Jain, D Batra, Z Kira… - … on Machine Learning, 2023 - proceedings.mlr.press
We present the task of" Social Rearrangement", consisting of cooperative everyday tasks
like setting up the dinner table, tidying a house or unpacking groceries in a simulated multi …

NEWTON: Are large language models capable of physical reasoning?

YR Wang, J Duan, D Fox, S Srinivasa - arXiv preprint arXiv:2310.07018, 2023 - arxiv.org
Large Language Models (LLMs), through their contextualized representations, have been
empirically proven to encapsulate syntactic, semantic, word sense, and common-sense …

Leveraging commonsense knowledge from large language models for task and motion planning

Y Ding, X Zhang, C Paxton, S Zhang - RSS 2023 Workshop on …, 2023 - openreview.net
Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning
is frequently needed in this process. However, achieving commonsense arrangements …

Reinforcement Learning via Auxiliary Task Distillation

AN Harish, L Heck, JP Hanna, Z Kira, A Szot - European Conference on …, 2025 - Springer
Abstract We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a
new method that enables reinforcement learning (RL) to perform long-horizon robot control …

Enhanced Robot Navigation with Human Geometric Instruction

H Deguchi, S Taguchi, K Shibata… - 2023 IEEE/RSJ …, 2023 - ieeexplore.ieee.org
Recently, robot navigation methods using human instructions have been actively studied,
including visual language navigation. Although language is one of the most promising forms …

Grounding Multimodal Large Language Models in Actions

A Szot, B Mazoure, H Agrawal, D Hjelm, Z Kira… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have demonstrated a wide range of
capabilities across many domains, including Embodied AI. In this work, we study how to best …