Tight first-and second-order regret bounds for adversarial linear bandits

First-order regret in reinforcement learning with linear function approximation: A robust estimation approach

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press

Obtaining first-order regret bounds—regret bounds scaling not as the worst-case but with
some measure of the performance of the optimal policy on a given instance—is a core …

被引用次数：38 相关文章所有 8 个版本

[PDF] mlr.press

A blackbox approach to best of both worlds in bandits and beyond

C Dann, CY Wei, J Zimmert - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both
the adversarial and the stochastic regimes have received growing attention recently …

被引用次数：23 相关文章所有 4 个版本

[PDF] neurips.cc

Efficient first-order contextual bandits: Prediction, allocation, and triangular discrimination

DJ Foster, A Krishnamurthy - Advances in Neural …, 2021 - proceedings.neurips.cc

A recurring theme in statistical learning, online learning, and beyond is that faster
convergence rates are possible for problems with low noise, often quantified by the …

被引用次数：47 相关文章所有 8 个版本

[PDF] mlr.press

Parameter-free multi-armed bandit algorithms with hybrid data-dependent regret bounds

S Ito - Conference on Learning Theory, 2021 - proceedings.mlr.press

This paper presents multi-armed bandit (MAB) algorithms that work well in adversarial
environments and that offer improved performance by exploiting inherent structures in such …

被引用次数：44 相关文章所有 3 个版本

[PDF] neurips.cc

First-and second-order bounds for adversarial linear contextual bandits

J Olkhovskaya, J Mayo, T van Erven… - Advances in Neural …, 2024 - proceedings.neurips.cc

We consider the adversarial linear contextual bandit setting, whichallows for the loss
functions associated with each of $ K $ arms to changeover time without restriction …

被引用次数：11 相关文章所有 8 个版本

[PDF] neurips.cc

An exploration-by-optimization approach to best of both worlds in linear bandits

S Ito, K Takemura - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In this paper, we consider how to construct best-of-both-worlds linear bandit algorithms that
achieve nearly optimal performance for both stochastic and adversarial environments. For …

被引用次数：3 相关文章所有 4 个版本

[PDF] mlr.press

Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits

J Zimmert, T Lattimore - Conference on Learning Theory, 2022 - proceedings.mlr.press

We introduce a modification of follow the regularised leader and combine it with the log
determinant potential and suitable loss estimators to prove that the minimax regret for …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

More benefits of being distributional: Second-order bounds for reinforcement learning

K Wang, O Oertell, A Agarwal, N Kallus… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the
return distribution, can obtain second-order bounds in both online and offline RL in general …

被引用次数：9 相关文章所有 3 个版本

[PDF] neurips.cc

Hybrid regret bounds for combinatorial semi-bandits and adversarial linear bandits

S Ito - Advances in Neural Information Processing Systems, 2021 - proceedings.neurips.cc

This study aims to develop bandit algorithms that automatically exploit tendencies of certain
environments to improve performance, without any prior knowledge regarding the …

被引用次数：17 相关文章所有 5 个版本

[PDF] neurips.cc

Delay and cooperation in nonstochastic linear bandits

S Ito, D Hatano, H Sumita… - Advances in …, 2020 - proceedings.neurips.cc

This paper offers a nearly optimal algorithm for online linear optimization with delayed
bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear …

被引用次数：30 相关文章所有 7 个版本