- 学术资源搜索

Bilinear classes: A structural framework for provable generalization in rl

S Du, S Kakade, J Lee, S Lovett… - International …, 2021 - proceedings.mlr.press

Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …

被引用次数：237 相关文章所有 8 个版本

[PDF] arxiv.org

Randomized ensembled double q-learning: Learning fast without a model

X Chen, C Wang, Z Zhou, K Ross - arXiv preprint arXiv:2101.05982, 2021 - arxiv.org

Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved
much higher sample efficiency than previous model-free methods for continuous-action DRL …

被引用次数：266 相关文章所有 7 个版本

[PDF] neurips.cc

Deep reinforcement learning with plasticity injection

E Nikishin, J Oh, G Ostrovski, C Lyle… - Advances in …, 2024 - proceedings.neurips.cc

A growing body of evidence suggests that neural networks employed in deep reinforcement
learning (RL) gradually lose their plasticity, the ability to learn from new data; however, the …

被引用次数：35 相关文章所有 6 个版本

[PDF] neurips.cc

Safe reinforcement learning by imagining the near future

G Thomas, Y Luo, T Ma - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Safe reinforcement learning is a promising path toward applying reinforcement learning
algorithms to real-world problems, where suboptimal behaviors may lead to actual negative …

被引用次数：77 相关文章所有 7 个版本

[PDF] neurips.cc

Vrl3: A data-driven framework for visual deep reinforcement learning

C Wang, X Luo, K Ross, D Li - Advances in Neural …, 2022 - proceedings.neurips.cc

We propose VRL3, a powerful data-driven framework with a simple design for solving
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …

被引用次数：41 相关文章所有 11 个版本

[PDF] neurips.cc

Learning barrier certificates: Towards safe reinforcement learning with zero training-time violations

Y Luo, T Ma - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Training-time safety violations have been a major concern when we deploy reinforcement
learning algorithms in the real world. This paper explores the possibility of safe RL …

被引用次数：46 相关文章所有 6 个版本

[PDF] neurips.cc

When is agnostic reinforcement learning statistically tractable?

Z Jia, G Li, A Rakhlin, A Sekhari… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

Model-based visual planning with self-supervised functional distances

S Tian, S Nair, F Ebert, S Dasari, B Eysenbach… - arXiv preprint arXiv …, 2020 - arxiv.org

A generalist robot must be able to complete a variety of tasks in its environment. One
appealing way to specify each task is in terms of a goal observation. However, learning goal …

被引用次数：61 相关文章所有 5 个版本

[PDF] arxiv.org

Scaling active inference

A Tschantz, M Baltieri, AK Seth… - 2020 international joint …, 2020 - ieeexplore.ieee.org

In reinforcement learning (RL), agents often operate in partially observed and uncertain
environments. Model-based RL suggests that this is best achieved by learning and …

被引用次数：77 相关文章所有 6 个版本

[PDF] neurips.cc

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature

K Dong, J Yang, T Ma - Advances in neural information …, 2021 - proceedings.neurips.cc

This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …

被引用次数：41 相关文章所有 8 个版本