Parameter-free multi-armed bandit algorithms with hybrid data-dependent regret bounds

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press

We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

被引用次数：61 相关文章所有 6 个版本

[PDF] mlr.press

Refined regret for adversarial mdps with linear function approximation

Y Dai, H Luo, CY Wei, J Zimmert - … Conference on Machine …, 2023 - proceedings.mlr.press

We consider learning in an adversarial Markov Decision Process (MDP) where the loss
functions can change arbitrarily over $ K $ episodes and the state space can be arbitrarily …

被引用次数：20 相关文章所有 10 个版本

[PDF] neurips.cc

Improved best-of-both-worlds guarantees for multi-armed bandits: FTRL with general regularizers and multiple optimal arms

T Jin, J Liu, H Luo - Advances in Neural Information …, 2023 - proceedings.neurips.cc

We study the problem of designing adaptive multi-armed bandit algorithms that perform
optimally in both the stochastic setting and the adversarial setting simultaneously (often …

被引用次数：19 相关文章所有 7 个版本

[PDF] neurips.cc

Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs

S Ito, T Tsuchiya, J Honda - Advances in Neural Information …, 2022 - proceedings.neurips.cc

This study considers online learning with general directed feedback graphs. For this
problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds …

被引用次数：28 相关文章所有 6 个版本

[PDF] mlr.press

A blackbox approach to best of both worlds in bandits and beyond

C Dann, CY Wei, J Zimmert - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both
the adversarial and the stochastic regimes have received growing attention recently …

被引用次数：23 相关文章所有 4 个版本

[PDF] mlr.press

Best of both worlds policy optimization

C Dann, CY Wei, J Zimmert - International Conference on …, 2023 - proceedings.mlr.press

Policy optimization methods are popular reinforcement learning algorithms in practice and
recent works have build theoretical foundation for them by proving $\sqrt {T} $ regret bounds …

被引用次数：15 相关文章所有 8 个版本

[PDF] neurips.cc

The best of both worlds: stochastic and adversarial episodic mdps with unknown transition

T Jin, L Huang, H Luo - Advances in Neural Information …, 2021 - proceedings.neurips.cc

We consider the best-of-both-worlds problem for learning an episodic Markov Decision
Process through $ T $ episodes, with the goal of achieving $\widetilde {\mathcal {O}}(\sqrt …

被引用次数：51 相关文章所有 7 个版本

[PDF] neurips.cc

Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

T Tsuchiya, S Ito, J Honda - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Adaptivity to the difficulties of a problem is a key property in sequential decision-making
problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) …

被引用次数：9 相关文章所有 7 个版本

[PDF] neurips.cc

A best-of-both-worlds algorithm for bandits with delayed feedback

S Masoudian, J Zimmert… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial
multiarmed bandits with delayed feedback, which in addition to the minimax optimal …

被引用次数：20 相关文章所有 9 个版本

[PDF] neurips.cc

No-regret online reinforcement learning with adversarial losses and transitions

T Jin, J Liu, C Rouyer, W Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Existing online learning algorithms for adversarial Markov Decision Processes achieve
$\mathcal {O}(\sqrt {T}) $ regret after $ T $ rounds of interactions even if the loss functions …

被引用次数：12 相关文章所有 7 个版本