Oracle-SAGE: Planning Ahead in Graph-Based Deep Reinforcement Learning
Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2022•Springer
Deep reinforcement learning (RL) commonly suffers from high sample complexity and poor
generalisation, especially with high-dimensional (image-based) input. Where available
(such as some robotic control domains), low dimensional vector inputs outperform their
image based counterparts, but it is challenging to represent complex dynamic environments
in this manner. Relational reinforcement learning instead represents the world as a set of
objects and the relations between them; offering a flexible yet expressive view which …
generalisation, especially with high-dimensional (image-based) input. Where available
(such as some robotic control domains), low dimensional vector inputs outperform their
image based counterparts, but it is challenging to represent complex dynamic environments
in this manner. Relational reinforcement learning instead represents the world as a set of
objects and the relations between them; offering a flexible yet expressive view which …
Abstract
Deep reinforcement learning (RL) commonly suffers from high sample complexity and poor generalisation, especially with high-dimensional (image-based) input. Where available (such as some robotic control domains), low dimensional vector inputs outperform their image based counterparts, but it is challenging to represent complex dynamic environments in this manner. Relational reinforcement learning instead represents the world as a set of objects and the relations between them; offering a flexible yet expressive view which provides structural inductive biases to aid learning. Recently relational RL methods have been extended with modern function approximation using graph neural networks (GNNs). However, inherent limitations in the processing model for GNNs result in decreased returns when important information is dispersed widely throughout the graph. We outline a hybrid learning and planning model which uses reinforcement learning to propose and select subgoals for a planning model to achieve. This includes a novel action selection mechanism and loss function to allow training around the non-differentiable planner. We demonstrate our algorithms effectiveness on a range of domains, including MiniHack and a challenging extension of the classic taxi domain.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果