Contextual information-directed sampling

B Hao, T Lattimore - Advances in neural information …, 2022 - proceedings.neurips.cc

Abstract Information-directed sampling (IDS) has revealed its potential as a data-efficient
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …

被引用次数：22 相关文章所有 8 个版本

[PDF] mlr.press

An information-theoretic analysis of nonstationary bandit learning

S Min, D Russo - International Conference on Machine …, 2023 - proceedings.mlr.press

In nonstationary bandit learning problems, the decision-maker must continually gather
information and adapt their action selection as the latent state of the environment evolves. In …

被引用次数：9 相关文章所有 6 个版本

[PDF] neurips.cc

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc

In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

被引用次数：4 相关文章所有 6 个版本

[PDF] jmlr.org

Linear partial monitoring for sequential decision making: Algorithms, regret bounds and applications

J Kirschner, T Lattimore, A Krause - The Journal of Machine Learning …, 2023 - dl.acm.org

Partial monitoring is an expressive framework for sequential decision-making with an
abundance of applications, including graph-structured and dueling bandits, dynamic pricing …

被引用次数：5 相关文章所有 4 个版本

[PDF] mlr.press

Nonstationary bandit learning via predictive sampling

Y Liu, B Van Roy, K Xu - International Conference on …, 2023 - proceedings.mlr.press

Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …

被引用次数：21 相关文章所有 3 个版本

[PDF] neurips.cc

Lifting the information ratio: An information-theoretic analysis of thompson sampling for contextual bandits

G Neu, I Olkhovskaia, M Papini… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual
bandits with binary losses and adversarially-selected contexts. We adapt the information …

被引用次数：16 相关文章所有 7 个版本

[PDF] mlr.press

Thompson sampling for high-dimensional sparse linear contextual bandits

S Chakraborty, S Roy, A Tewari - … Conference on Machine …, 2023 - proceedings.mlr.press

We consider the stochastic linear contextual bandit problem with high-dimensional features.
We analyze the Thompson sampling algorithm using special classes of sparsity-inducing …

被引用次数：9 相关文章所有 8 个版本

[PDF] mit.edu

Bayesian reinforcement learning with limited cognitive load

D Arumugam, MK Ho, ND Goodman, B Van Roy - Open Mind, 2024 - direct.mit.edu

All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Thompson sampling for stochastic bandits with noisy contexts: An information-theoretic regret analysis

ST Jose, S Moothedath - arXiv preprint arXiv:2401.11565, 2024 - arxiv.org

We explore a stochastic contextual linear bandit problem where the agent observes a noisy,
corrupted version of the true context through a noise channel with an unknown noise …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

A definition of non-stationary bandits

Y Liu, X Kuang, B Van Roy - arXiv preprint arXiv:2302.12202, 2023 - arxiv.org

Despite the subject of non-stationary bandit learning having attracted much recent attention,
we have yet to identify a formal definition of non-stationarity that can consistently distinguish …

被引用次数：5 相关文章所有 2 个版本