- 学术资源搜索

A tour of reinforcement learning: The view from continuous control

B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org

This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …

被引用次数：776 相关文章所有 5 个版本

[PDF] tor-lattimore.com

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3212 相关文章所有 9 个版本

[PDF] mlr.press

Model-based reinforcement learning with value-targeted regression

A Ayoub, Z Jia, C Szepesvari… - … on Machine Learning, 2020 - proceedings.mlr.press

This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …

被引用次数：349 相关文章所有 8 个版本

[PDF] nowpublishers.com

Bayesian reinforcement learning: A survey

M Ghavamzadeh, S Mannor, J Pineau… - … and Trends® in …, 2015 - nowpublishers.com

Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …

被引用次数：580 相关文章所有 11 个版本

[PDF] neurips.cc

Regret bounds for robust adaptive control of the linear quadratic regulator

S Dean, H Mania, N Matni… - Advances in Neural …, 2018 - proceedings.neurips.cc

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown
linear system is controlled subject to quadratic costs. Leveraging recent developments in the …

被引用次数：313 相关文章所有 7 个版本

[PDF] neurips.cc

Data center cooling using model-predictive control

N Lazic, C Boutilier, T Lu, E Wong… - Advances in …, 2018 - proceedings.neurips.cc

Despite impressive recent advances in reinforcement learning (RL), its deployment in real-
world physical systems is often complicated by unexpected events, limited data, and the …

被引用次数：243 相关文章所有 12 个版本

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc

We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …

被引用次数：253 相关文章所有 13 个版本

[PDF] arxiv.org

Efficient exploration through bayesian deep q-networks

K Azizzadenesheli, E Brunskill… - 2018 Information …, 2018 - ieeexplore.ieee.org

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …

被引用次数：209 相关文章所有 13 个版本

[PDF] neurips.cc

Bayesian decision-making under misspecified priors with applications to meta-learning

M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc

Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …

被引用次数：57 相关文章所有 7 个版本

[PDF] neurips.cc

Learning unknown markov decision processes: A thompson sampling approach

Y Ouyang, M Gagrani, A Nayyar… - Advances in neural …, 2017 - proceedings.neurips.cc

We consider the problem of learning an unknown Markov Decision Process (MDP) that is
weakly communicating in the infinite horizon setting. We propose a Thompson Sampling …

被引用次数：160 相关文章所有 7 个版本