A review of robot learning for manipulation: Challenges, representations, and algorithms
A key challenge in intelligent robotics is creating robots that are capable of directly
interacting with the world around them to achieve their goals. The last decade has seen …
interacting with the world around them to achieve their goals. The last decade has seen …
Embodied communication: How robots and people communicate through physical interaction
A Kalinowska, PM Pilarski… - Annual review of control …, 2023 - annualreviews.org
Early research on physical human–robot interaction (pHRI) has necessarily focused on
device design—the creation of compliant and sensorized hardware, such as exoskeletons …
device design—the creation of compliant and sensorized hardware, such as exoskeletons …
Defining and characterizing reward gaming
We provide the first formal definition of\textbf {reward hacking}, a phenomenon where
optimizing an imperfect proxy reward function, $\mathcal {\tilde {R}} $, leads to poor …
optimizing an imperfect proxy reward function, $\mathcal {\tilde {R}} $, leads to poor …
Robots that ask for help: Uncertainty alignment for large language model planners
Large language models (LLMs) exhibit a wide range of promising capabilities--from step-by-
step planning to commonsense reasoning--that may provide utility for robots, but remain …
step planning to commonsense reasoning--that may provide utility for robots, but remain …
Few-shot preference learning for human-in-the-loop rl
DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press
While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …
designing sufficiently informative reward functions for complex tasks has proven to be …
Discriminator-weighted offline imitation learning from suboptimal demonstrations
We study the problem of offline Imitation Learning (IL) where an agent aims to learn an
optimal expert behavior policy without additional online environment interactions. Instead …
optimal expert behavior policy without additional online environment interactions. Instead …
Ceil: Generalized contextual imitation learning
In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …
Learning from suboptimal demonstration via self-supervised reward regression
Abstract Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-
roboticist end-users to teach robots to perform a task by providing a human demonstration …
roboticist end-users to teach robots to perform a task by providing a human demonstration …
Benchmarks and algorithms for offline preference-based reward learning
Learning a reward function from human preferences is challenging as it typically requires
having a high-fidelity simulator or using expensive and potentially unsafe actual physical …
having a high-fidelity simulator or using expensive and potentially unsafe actual physical …
Imitation learning by estimating expertise of demonstrators
Many existing imitation learning datasets are collected from multiple demonstrators, each
with different expertise at different parts of the environment. Yet, standard imitation learning …
with different expertise at different parts of the environment. Yet, standard imitation learning …