A tour of reinforcement learning: The view from continuous control
B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org
This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …
with a focus on continuous control applications. It reviews the general formulation …
[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Model-based reinforcement learning with value-targeted regression
This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …
Bayesian reinforcement learning: A survey
Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …
methods for incorporating prior information into inference algorithms. In this survey, we …
Regret bounds for robust adaptive control of the linear quadratic regulator
We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown
linear system is controlled subject to quadratic costs. Leveraging recent developments in the …
linear system is controlled subject to quadratic costs. Leveraging recent developments in the …
Data center cooling using model-predictive control
Despite impressive recent advances in reinforcement learning (RL), its deployment in real-
world physical systems is often complicated by unexpected events, limited data, and the …
world physical systems is often complicated by unexpected events, limited data, and the …
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds
S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …
Efficient exploration through bayesian deep q-networks
K Azizzadenesheli, E Brunskill… - 2018 Information …, 2018 - ieeexplore.ieee.org
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …
Bayesian decision-making under misspecified priors with applications to meta-learning
M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc
Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …
Learning unknown markov decision processes: A thompson sampling approach
We consider the problem of learning an unknown Markov Decision Process (MDP) that is
weakly communicating in the infinite horizon setting. We propose a Thompson Sampling …
weakly communicating in the infinite horizon setting. We propose a Thompson Sampling …