Limiting extrapolation in linear approximate value iteration

A Ayoub, Z Jia, C Szepesvari… - … on Machine Learning, 2020 - proceedings.mlr.press

This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …

被引用次数：329 相关文章所有 8 个版本

[PDF] mlr.press

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press

We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

被引用次数：240 相关文章所有 5 个版本

[PDF] mlr.press

Learning with good feature representations in bandits and in rl with a generative model

T Lattimore, C Szepesvari… - … conference on machine …, 2020 - proceedings.mlr.press

The construction in the recent paper by Du et al.[2019] implies that searching for a near-
optimal action in a bandit sometimes requires examining essentially all the actions, even if …

被引用次数：194 相关文章所有 7 个版本

[PDF] neurips.cc

Reinforcement learning with general value function approximation: Provably efficient approach via bounded eluder dimension

R Wang, RR Salakhutdinov… - Advances in Neural …, 2020 - proceedings.neurips.cc

Value function approximation has demonstrated phenomenal empirical success in
reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …

被引用次数：168 相关文章所有 6 个版本

[PDF] mlr.press

Exponential lower bounds for batch reinforcement learning: Batch rl can be exponentially harder than online rl

A Zanette - International Conference on Machine Learning, 2021 - proceedings.mlr.press

Several practical applications of reinforcement learning involve an agent learning from past
data without the possibility of further exploration. Often these applications require us to 1) …

被引用次数：88 相关文章所有 4 个版本

[PDF] mlr.press

Instabilities of offline rl with pre-trained neural representation

R Wang, Y Wu, R Salakhutdinov… - … on Machine Learning, 2021 - proceedings.mlr.press

In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn)
policies in scenarios where the data are collected from a distribution that substantially differs …

被引用次数：51 相关文章所有 8 个版本

[PDF] mlr.press

Model-based reinforcement learning with value-targeted regression

Z Jia, L Yang, C Szepesvari… - Learning for Dynamics …, 2020 - proceedings.mlr.press

Reinforcement learning (RL) applies to control problems with large state and action spaces,
hence it is natural to consider RL with a parametric model. In this paper we focus on finite …

被引用次数：71 相关文章所有 4 个版本

[PDF] qcloudimg.com

[PDF][PDF] Provably efficient reinforcement learning with general value function approximation

R Wang, R Salakhutdinov… - arXiv preprint arXiv …, 2020 - ask.qcloudimg.com

Value function approximation has demonstrated phenomenal empirical success in
reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …

被引用次数：64 相关文章

[PDF] nsf.gov

Adaptive discretization in online reinforcement learning

SR Sinclair, S Banerjee, CL Yu - Operations Research, 2023 - pubsonline.informs.org

Discretization-based approaches to solving online reinforcement learning problems are
studied extensively on applications such as resource allocation and cache management …

被引用次数：17 相关文章所有 6 个版本

[PDF] neurips.cc

Sample-efficient reinforcement learning is feasible for linearly realizable MDPs with limited revisiting

G Li, Y Chen, Y Chi, Y Gu… - Advances in Neural …, 2021 - proceedings.neurips.cc

Low-complexity models such as linear function representation play a pivotal role in enabling
sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with …

被引用次数：31 相关文章所有 12 个版本