Information directed sampling for sparse linear bandits

B Hao, T Lattimore - Advances in neural information …, 2022 - proceedings.neurips.cc

Abstract Information-directed sampling (IDS) has revealed its potential as a data-efficient
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …

被引用次数：22 相关文章所有 8 个版本

[PDF] mlr.press

An information-theoretic analysis of nonstationary bandit learning

S Min, D Russo - International Conference on Machine …, 2023 - proceedings.mlr.press

In nonstationary bandit learning problems, the decision-maker must continually gather
information and adapt their action selection as the latent state of the environment evolves. In …

被引用次数：9 相关文章所有 6 个版本

[PDF] neurips.cc

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc

In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

被引用次数：4 相关文章所有 6 个版本

[PDF] mlr.press

Contextual information-directed sampling

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press

Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

被引用次数：17 相关文章所有 4 个版本

[PDF] mlr.press

Contexts can be cheap: Solving stochastic contextual bandits with linear bandit algorithms

OA Hanna, L Yang, C Fragouli - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

In this paper, we address the stochastic contextual linear bandit problem, where a decision
maker is provided a context (a random set of actions drawn from a distribution). The …

被引用次数：13 相关文章所有 5 个版本

[PDF] mlr.press

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press

We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

被引用次数：6 相关文章所有 6 个版本

[PDF] mlr.press

Thompson sampling for high-dimensional sparse linear contextual bandits

S Chakraborty, S Roy, A Tewari - … Conference on Machine …, 2023 - proceedings.mlr.press

We consider the stochastic linear contextual bandit problem with high-dimensional features.
We analyze the Thompson sampling algorithm using special classes of sparsity-inducing …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Thompson sampling for stochastic bandits with noisy contexts: An information-theoretic regret analysis

ST Jose, S Moothedath - arXiv preprint arXiv:2401.11565, 2024 - arxiv.org

We explore a stochastic contextual linear bandit problem where the agent observes a noisy,
corrupted version of the true context through a noise channel with an unknown noise …

被引用次数：5 相关文章所有 2 个版本

[PDF] aaai.org

Stochastic contextual bandits with long horizon rewards

Y Qin, Y Li, F Pasqualetti, M Fazel… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

The growing interest in complex decision-making and language modeling problems
highlights the importance of sample-efficient learning over very long horizons. This work …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

Variance-aware sparse linear bandits

Y Dai, R Wang, SS Du - arXiv preprint arXiv:2205.13450, 2022 - arxiv.org

It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity
which is much smaller than the ambient dimension, the worst-case minimax regret is …

被引用次数：11 相关文章所有 6 个版本