A tour of reinforcement learning: The view from continuous control

B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org
This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Model-based reinforcement learning with value-targeted regression

A Ayoub, Z Jia, C Szepesvari… - … on Machine Learning, 2020 - proceedings.mlr.press
This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …

Bayesian reinforcement learning: A survey

M Ghavamzadeh, S Mannor, J Pineau… - … and Trends® in …, 2015 - nowpublishers.com
Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …

Regret bounds for robust adaptive control of the linear quadratic regulator

S Dean, H Mania, N Matni… - Advances in Neural …, 2018 - proceedings.neurips.cc
We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown
linear system is controlled subject to quadratic costs. Leveraging recent developments in the …

Data center cooling using model-predictive control

N Lazic, C Boutilier, T Lu, E Wong… - Advances in …, 2018 - proceedings.neurips.cc
Despite impressive recent advances in reinforcement learning (RL), its deployment in real-
world physical systems is often complicated by unexpected events, limited data, and the …

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …

Efficient exploration through bayesian deep q-networks

K Azizzadenesheli, E Brunskill… - 2018 Information …, 2018 - ieeexplore.ieee.org
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …

Bayesian decision-making under misspecified priors with applications to meta-learning

M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc
Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …

Learning unknown markov decision processes: A thompson sampling approach

Y Ouyang, M Gagrani, A Nayyar… - Advances in neural …, 2017 - proceedings.neurips.cc
We consider the problem of learning an unknown Markov Decision Process (MDP) that is
weakly communicating in the infinite horizon setting. We propose a Thompson Sampling …