Regret bounds for information-directed reinforcement learning

B Hao, T Lattimore - Advances in neural information …, 2022 - proceedings.neurips.cc
Abstract Information-directed sampling (IDS) has revealed its potential as a data-efficient
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …

An information-theoretic analysis of nonstationary bandit learning

S Min, D Russo - International Conference on Machine …, 2023 - proceedings.mlr.press
In nonstationary bandit learning problems, the decision-maker must continually gather
information and adapt their action selection as the latent state of the environment evolves. In …

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

Contextual information-directed sampling

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press
Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

Contexts can be cheap: Solving stochastic contextual bandits with linear bandit algorithms

OA Hanna, L Yang, C Fragouli - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
In this paper, we address the stochastic contextual linear bandit problem, where a decision
maker is provided a context (a random set of actions drawn from a distribution). The …

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press
We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

Thompson sampling for high-dimensional sparse linear contextual bandits

S Chakraborty, S Roy, A Tewari - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the stochastic linear contextual bandit problem with high-dimensional features.
We analyze the Thompson sampling algorithm using special classes of sparsity-inducing …

Thompson sampling for stochastic bandits with noisy contexts: An information-theoretic regret analysis

ST Jose, S Moothedath - arXiv preprint arXiv:2401.11565, 2024 - arxiv.org
We explore a stochastic contextual linear bandit problem where the agent observes a noisy,
corrupted version of the true context through a noise channel with an unknown noise …

Stochastic contextual bandits with long horizon rewards

Y Qin, Y Li, F Pasqualetti, M Fazel… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
The growing interest in complex decision-making and language modeling problems
highlights the importance of sample-efficient learning over very long horizons. This work …

Variance-aware sparse linear bandits

Y Dai, R Wang, SS Du - arXiv preprint arXiv:2205.13450, 2022 - arxiv.org
It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity
which is much smaller than the ambient dimension, the worst-case minimax regret is …