Model-based reinforcement learning with value-targeted regression

A Ayoub, Z Jia, C Szepesvari… - … on Machine Learning, 2020 - proceedings.mlr.press
This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press
We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

Learning with good feature representations in bandits and in rl with a generative model

T Lattimore, C Szepesvari… - … conference on machine …, 2020 - proceedings.mlr.press
The construction in the recent paper by Du et al.[2019] implies that searching for a near-
optimal action in a bandit sometimes requires examining essentially all the actions, even if …

Reinforcement learning with general value function approximation: Provably efficient approach via bounded eluder dimension

R Wang, RR Salakhutdinov… - Advances in Neural …, 2020 - proceedings.neurips.cc
Value function approximation has demonstrated phenomenal empirical success in
reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …

Exponential lower bounds for batch reinforcement learning: Batch rl can be exponentially harder than online rl

A Zanette - International Conference on Machine Learning, 2021 - proceedings.mlr.press
Several practical applications of reinforcement learning involve an agent learning from past
data without the possibility of further exploration. Often these applications require us to 1) …

Instabilities of offline rl with pre-trained neural representation

R Wang, Y Wu, R Salakhutdinov… - … on Machine Learning, 2021 - proceedings.mlr.press
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn)
policies in scenarios where the data are collected from a distribution that substantially differs …

Model-based reinforcement learning with value-targeted regression

Z Jia, L Yang, C Szepesvari… - Learning for Dynamics …, 2020 - proceedings.mlr.press
Reinforcement learning (RL) applies to control problems with large state and action spaces,
hence it is natural to consider RL with a parametric model. In this paper we focus on finite …

[PDF][PDF] Provably efficient reinforcement learning with general value function approximation

R Wang, R Salakhutdinov… - arXiv preprint arXiv …, 2020 - ask.qcloudimg.com
Value function approximation has demonstrated phenomenal empirical success in
reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …

Adaptive discretization in online reinforcement learning

SR Sinclair, S Banerjee, CL Yu - Operations Research, 2023 - pubsonline.informs.org
Discretization-based approaches to solving online reinforcement learning problems are
studied extensively on applications such as resource allocation and cache management …

Sample-efficient reinforcement learning is feasible for linearly realizable MDPs with limited revisiting

G Li, Y Chen, Y Chi, Y Gu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Low-complexity models such as linear function representation play a pivotal role in enabling
sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with …