Regret bounds for information-directed reinforcement learning
B Hao, T Lattimore - Advances in neural information …, 2022 - proceedings.neurips.cc
Abstract Information-directed sampling (IDS) has revealed its potential as a data-efficient
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …
An information-theoretic analysis of nonstationary bandit learning
In nonstationary bandit learning problems, the decision-maker must continually gather
information and adapt their action selection as the latent state of the environment evolves. In …
information and adapt their action selection as the latent state of the environment evolves. In …
Improved Bayesian regret bounds for Thompson sampling in reinforcement learning
A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …
reinforcement learning in a multitude of settings. We present a refined analysis of the …
Linear partial monitoring for sequential decision making: Algorithms, regret bounds and applications
Partial monitoring is an expressive framework for sequential decision-making with an
abundance of applications, including graph-structured and dueling bandits, dynamic pricing …
abundance of applications, including graph-structured and dueling bandits, dynamic pricing …
Nonstationary bandit learning via predictive sampling
Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …
environments. However, as we demonstrate in this paper, it can perform poorly when …
Lifting the information ratio: An information-theoretic analysis of thompson sampling for contextual bandits
We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual
bandits with binary losses and adversarially-selected contexts. We adapt the information …
bandits with binary losses and adversarially-selected contexts. We adapt the information …
Thompson sampling for high-dimensional sparse linear contextual bandits
We consider the stochastic linear contextual bandit problem with high-dimensional features.
We analyze the Thompson sampling algorithm using special classes of sparsity-inducing …
We analyze the Thompson sampling algorithm using special classes of sparsity-inducing …
Bayesian reinforcement learning with limited cognitive load
All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …
information. As such, a general theory of adaptive behavior should be able to account for the …
Thompson sampling for stochastic bandits with noisy contexts: An information-theoretic regret analysis
ST Jose, S Moothedath - arXiv preprint arXiv:2401.11565, 2024 - arxiv.org
We explore a stochastic contextual linear bandit problem where the agent observes a noisy,
corrupted version of the true context through a noise channel with an unknown noise …
corrupted version of the true context through a noise channel with an unknown noise …
A definition of non-stationary bandits
Despite the subject of non-stationary bandit learning having attracted much recent attention,
we have yet to identify a formal definition of non-stationarity that can consistently distinguish …
we have yet to identify a formal definition of non-stationarity that can consistently distinguish …