Regret bounds for information-directed reinforcement learning
B Hao, T Lattimore - Advances in neural information …, 2022 - proceedings.neurips.cc
Abstract Information-directed sampling (IDS) has revealed its potential as a data-efficient
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …
algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for …
An information-theoretic analysis of nonstationary bandit learning
In nonstationary bandit learning problems, the decision-maker must continually gather
information and adapt their action selection as the latent state of the environment evolves. In …
information and adapt their action selection as the latent state of the environment evolves. In …
Improved Bayesian regret bounds for Thompson sampling in reinforcement learning
A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …
reinforcement learning in a multitude of settings. We present a refined analysis of the …
Contextual information-directed sampling
Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …
Contexts can be cheap: Solving stochastic contextual bandits with linear bandit algorithms
In this paper, we address the stochastic contextual linear bandit problem, where a decision
maker is provided a context (a random set of actions drawn from a distribution). The …
maker is provided a context (a random set of actions drawn from a distribution). The …
Leveraging demonstrations to improve online learning: Quality matters
We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …
is natural to expect some improvement, but the question is how, and by how much? We …
Thompson sampling for high-dimensional sparse linear contextual bandits
We consider the stochastic linear contextual bandit problem with high-dimensional features.
We analyze the Thompson sampling algorithm using special classes of sparsity-inducing …
We analyze the Thompson sampling algorithm using special classes of sparsity-inducing …
Thompson sampling for stochastic bandits with noisy contexts: An information-theoretic regret analysis
ST Jose, S Moothedath - arXiv preprint arXiv:2401.11565, 2024 - arxiv.org
We explore a stochastic contextual linear bandit problem where the agent observes a noisy,
corrupted version of the true context through a noise channel with an unknown noise …
corrupted version of the true context through a noise channel with an unknown noise …
Stochastic contextual bandits with long horizon rewards
The growing interest in complex decision-making and language modeling problems
highlights the importance of sample-efficient learning over very long horizons. This work …
highlights the importance of sample-efficient learning over very long horizons. This work …
Variance-aware sparse linear bandits
It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity
which is much smaller than the ambient dimension, the worst-case minimax regret is …
which is much smaller than the ambient dimension, the worst-case minimax regret is …