Stop regressing: Training value functions via classification for scalable deep rl

J Farebrother, J Orbay, Q Vuong, AA Taïga… - arXiv preprint arXiv …, 2024 - arxiv.org
Value functions are a central component of deep reinforcement learning (RL). These
functions, parameterized by neural networks, are trained using a mean squared error …

When Models Meet Data: Pragmatic Robot Learning with Model-based Optimization

M Bhardwaj - 2024 - digital.lib.washington.edu
Autonomous robots operating in complex and dynamic real-world scenarios must exhibit fast
and reactive behaviors to adapt to environment changes, and learn to improve their …