First-order regret in reinforcement learning with linear function approximation: A robust estimation approach

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press
Obtaining first-order regret bounds—regret bounds scaling not as the worst-case but with
some measure of the performance of the optimal policy on a given instance—is a core …

A blackbox approach to best of both worlds in bandits and beyond

C Dann, CY Wei, J Zimmert - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both
the adversarial and the stochastic regimes have received growing attention recently …

Efficient first-order contextual bandits: Prediction, allocation, and triangular discrimination

DJ Foster, A Krishnamurthy - Advances in Neural …, 2021 - proceedings.neurips.cc
A recurring theme in statistical learning, online learning, and beyond is that faster
convergence rates are possible for problems with low noise, often quantified by the …

Parameter-free multi-armed bandit algorithms with hybrid data-dependent regret bounds

S Ito - Conference on Learning Theory, 2021 - proceedings.mlr.press
This paper presents multi-armed bandit (MAB) algorithms that work well in adversarial
environments and that offer improved performance by exploiting inherent structures in such …

First-and second-order bounds for adversarial linear contextual bandits

J Olkhovskaya, J Mayo, T van Erven… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider the adversarial linear contextual bandit setting, whichallows for the loss
functions associated with each of $ K $ arms to changeover time without restriction …

An exploration-by-optimization approach to best of both worlds in linear bandits

S Ito, K Takemura - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In this paper, we consider how to construct best-of-both-worlds linear bandit algorithms that
achieve nearly optimal performance for both stochastic and adversarial environments. For …

Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits

J Zimmert, T Lattimore - Conference on Learning Theory, 2022 - proceedings.mlr.press
We introduce a modification of follow the regularised leader and combine it with the log
determinant potential and suitable loss estimators to prove that the minimax regret for …

More benefits of being distributional: Second-order bounds for reinforcement learning

K Wang, O Oertell, A Agarwal, N Kallus… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the
return distribution, can obtain second-order bounds in both online and offline RL in general …

Hybrid regret bounds for combinatorial semi-bandits and adversarial linear bandits

S Ito - Advances in Neural Information Processing Systems, 2021 - proceedings.neurips.cc
This study aims to develop bandit algorithms that automatically exploit tendencies of certain
environments to improve performance, without any prior knowledge regarding the …

Delay and cooperation in nonstochastic linear bandits

S Ito, D Hatano, H Sumita… - Advances in …, 2020 - proceedings.neurips.cc
This paper offers a nearly optimal algorithm for online linear optimization with delayed
bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear …