Intelligent problem-solving as integrated hierarchical reinforcement learning

M Eppe, C Gumbsch, M Kerzel, PDH Nguyen… - Nature Machine …, 2022 - nature.com
According to cognitive psychology and related disciplines, the development of complex
problem-solving behaviour in biological agents depends on hierarchical cognitive …

Learning agile soccer skills for a bipedal robot with deep reinforcement learning

T Haarnoja, B Moran, G Lever, SH Huang… - Science Robotics, 2024 - science.org
We investigated whether deep reinforcement learning (deep RL) is able to synthesize
sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be …

From motor control to team play in simulated humanoid football

S Liu, G Lever, Z Wang, J Merel, SMA Eslami… - Science Robotics, 2022 - science.org
Learning to combine control at the level of joint torques with longer-term goal-directed
behavior is a long-standing challenge for physically embodied artificial agents. Intelligent …

The challenges of exploration for offline reinforcement learning

N Lambert, M Wulfmeier, W Whitney, A Byravan… - arXiv preprint arXiv …, 2022 - arxiv.org
Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked
processes of reinforcement learning: collecting informative experience and inferring optimal …

Is bang-bang control all you need? solving continuous control with bernoulli policies

T Seyde, I Gilitschenski, W Schwarting… - Advances in …, 2021 - proceedings.neurips.cc
Reinforcement learning (RL) for continuous control typically employs distributions whose
support covers the entire action space. In this work, we investigate the colloquially known …

Independent component alignment for multi-task learning

D Senushkin, N Patakin… - Proceedings of the …, 2023 - openaccess.thecvf.com
In a multi-task learning (MTL) setting, a single model is trained to tackle a diverse set of tasks
jointly. Despite rapid progress in the field, MTL remains challenging due to optimization …

Data-efficient hindsight off-policy option learning

M Wulfmeier, D Rao, R Hafner… - International …, 2021 - proceedings.mlr.press
Abstract We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning
algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates …

Measuring interpretability of neural policies of robots with disentangled representation

TH Wang, W Xiao, T Seyde… - Conference on Robot …, 2023 - proceedings.mlr.press
The advancement of robots, particularly those functioning in complex human-centric
environments, relies on control solutions that are driven by machine learning …

Collect & infer-a fresh look at data-efficient reinforcement learning

M Riedmiller, JT Springenberg… - … on Robot Learning, 2022 - proceedings.mlr.press
This position paper proposes a fresh look at Reinforcement Learning (RL) from the
perspective of data-efficiency. RL has gone through three major stages: pure on-line RL …

Behavior priors for efficient reinforcement learning

D Tirumala, A Galashov, H Noh, L Hasenclever… - Journal of Machine …, 2022 - jmlr.org
As we deploy reinforcement learning agents to solve increasingly challenging problems,
methods that allow us to inject prior knowledge about the structure of the world and effective …