A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press
We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

Refined regret for adversarial mdps with linear function approximation

Y Dai, H Luo, CY Wei, J Zimmert - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider learning in an adversarial Markov Decision Process (MDP) where the loss
functions can change arbitrarily over $ K $ episodes and the state space can be arbitrarily …

Improved best-of-both-worlds guarantees for multi-armed bandits: FTRL with general regularizers and multiple optimal arms

T Jin, J Liu, H Luo - Advances in Neural Information …, 2023 - proceedings.neurips.cc
We study the problem of designing adaptive multi-armed bandit algorithms that perform
optimally in both the stochastic setting and the adversarial setting simultaneously (often …

Nearly optimal best-of-both-worlds algorithms for online learning with feedback graphs

S Ito, T Tsuchiya, J Honda - Advances in Neural Information …, 2022 - proceedings.neurips.cc
This study considers online learning with general directed feedback graphs. For this
problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds …

A blackbox approach to best of both worlds in bandits and beyond

C Dann, CY Wei, J Zimmert - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both
the adversarial and the stochastic regimes have received growing attention recently …

Best of both worlds policy optimization

C Dann, CY Wei, J Zimmert - International Conference on …, 2023 - proceedings.mlr.press
Policy optimization methods are popular reinforcement learning algorithms in practice and
recent works have build theoretical foundation for them by proving $\sqrt {T} $ regret bounds …

The best of both worlds: stochastic and adversarial episodic mdps with unknown transition

T Jin, L Huang, H Luo - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We consider the best-of-both-worlds problem for learning an episodic Markov Decision
Process through $ T $ episodes, with the goal of achieving $\widetilde {\mathcal {O}}(\sqrt …

Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

T Tsuchiya, S Ito, J Honda - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Adaptivity to the difficulties of a problem is a key property in sequential decision-making
problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) …

A best-of-both-worlds algorithm for bandits with delayed feedback

S Masoudian, J Zimmert… - Advances in Neural …, 2022 - proceedings.neurips.cc
We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial
multiarmed bandits with delayed feedback, which in addition to the minimax optimal …

No-regret online reinforcement learning with adversarial losses and transitions

T Jin, J Liu, C Rouyer, W Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Existing online learning algorithms for adversarial Markov Decision Processes achieve
$\mathcal {O}(\sqrt {T}) $ regret after $ T $ rounds of interactions even if the loss functions …