Regret bounds for information-directed reinforcement learning

B Hao, T Lattimore - Advances in neural information …, 2022 - proceedings.neurips.cc
Abstract Information-directed sampling (IDS) has revealed its potential as a data-efficient
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …

An information-theoretic analysis of nonstationary bandit learning

S Min, D Russo - International Conference on Machine …, 2023 - proceedings.mlr.press
In nonstationary bandit learning problems, the decision-maker must continually gather
information and adapt their action selection as the latent state of the environment evolves. In …

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

Linear partial monitoring for sequential decision making: Algorithms, regret bounds and applications

J Kirschner, T Lattimore, A Krause - The Journal of Machine Learning …, 2023 - dl.acm.org
Partial monitoring is an expressive framework for sequential decision-making with an
abundance of applications, including graph-structured and dueling bandits, dynamic pricing …

Nonstationary bandit learning via predictive sampling

Y Liu, B Van Roy, K Xu - International Conference on …, 2023 - proceedings.mlr.press
Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …

Lifting the information ratio: An information-theoretic analysis of thompson sampling for contextual bandits

G Neu, I Olkhovskaia, M Papini… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual
bandits with binary losses and adversarially-selected contexts. We adapt the information …

Thompson sampling for high-dimensional sparse linear contextual bandits

S Chakraborty, S Roy, A Tewari - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the stochastic linear contextual bandit problem with high-dimensional features.
We analyze the Thompson sampling algorithm using special classes of sparsity-inducing …

Bayesian reinforcement learning with limited cognitive load

D Arumugam, MK Ho, ND Goodman, B Van Roy - Open Mind, 2024 - direct.mit.edu
All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …

Thompson sampling for stochastic bandits with noisy contexts: An information-theoretic regret analysis

ST Jose, S Moothedath - arXiv preprint arXiv:2401.11565, 2024 - arxiv.org
We explore a stochastic contextual linear bandit problem where the agent observes a noisy,
corrupted version of the true context through a noise channel with an unknown noise …

A definition of non-stationary bandits

Y Liu, X Kuang, B Van Roy - arXiv preprint arXiv:2302.12202, 2023 - arxiv.org
Despite the subject of non-stationary bandit learning having attracted much recent attention,
we have yet to identify a formal definition of non-stationarity that can consistently distinguish …