First-order regret in reinforcement learning with linear function approximation: A robust estimation approach
AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press
Obtaining first-order regret bounds—regret bounds scaling not as the worst-case but with
some measure of the performance of the optimal policy on a given instance—is a core …
some measure of the performance of the optimal policy on a given instance—is a core …
A blackbox approach to best of both worlds in bandits and beyond
Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both
the adversarial and the stochastic regimes have received growing attention recently …
the adversarial and the stochastic regimes have received growing attention recently …
Efficient first-order contextual bandits: Prediction, allocation, and triangular discrimination
DJ Foster, A Krishnamurthy - Advances in Neural …, 2021 - proceedings.neurips.cc
A recurring theme in statistical learning, online learning, and beyond is that faster
convergence rates are possible for problems with low noise, often quantified by the …
convergence rates are possible for problems with low noise, often quantified by the …
Parameter-free multi-armed bandit algorithms with hybrid data-dependent regret bounds
S Ito - Conference on Learning Theory, 2021 - proceedings.mlr.press
This paper presents multi-armed bandit (MAB) algorithms that work well in adversarial
environments and that offer improved performance by exploiting inherent structures in such …
environments and that offer improved performance by exploiting inherent structures in such …
First-and second-order bounds for adversarial linear contextual bandits
We consider the adversarial linear contextual bandit setting, whichallows for the loss
functions associated with each of $ K $ arms to changeover time without restriction …
functions associated with each of $ K $ arms to changeover time without restriction …
An exploration-by-optimization approach to best of both worlds in linear bandits
S Ito, K Takemura - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In this paper, we consider how to construct best-of-both-worlds linear bandit algorithms that
achieve nearly optimal performance for both stochastic and adversarial environments. For …
achieve nearly optimal performance for both stochastic and adversarial environments. For …
Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits
J Zimmert, T Lattimore - Conference on Learning Theory, 2022 - proceedings.mlr.press
We introduce a modification of follow the regularised leader and combine it with the log
determinant potential and suitable loss estimators to prove that the minimax regret for …
determinant potential and suitable loss estimators to prove that the minimax regret for …
More benefits of being distributional: Second-order bounds for reinforcement learning
In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the
return distribution, can obtain second-order bounds in both online and offline RL in general …
return distribution, can obtain second-order bounds in both online and offline RL in general …
Hybrid regret bounds for combinatorial semi-bandits and adversarial linear bandits
S Ito - Advances in Neural Information Processing Systems, 2021 - proceedings.neurips.cc
This study aims to develop bandit algorithms that automatically exploit tendencies of certain
environments to improve performance, without any prior knowledge regarding the …
environments to improve performance, without any prior knowledge regarding the …
Delay and cooperation in nonstochastic linear bandits
This paper offers a nearly optimal algorithm for online linear optimization with delayed
bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear …
bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear …