Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization

P Zhao, YJ Zhang, L Zhang, ZH Zhou - Journal of Machine Learning …, 2024 - jmlr.org
We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …

Regret minimization and convergence to equilibria in general-sum markov games

L Erez, T Lancewicki, U Sherman… - International …, 2023 - proceedings.mlr.press
An abundance of recent impossibility results establish that regret minimization in Markov
games with adversarial opponents is both statistically and computationally intractable …

The best of both worlds: stochastic and adversarial episodic mdps with unknown transition

T Jin, L Huang, H Luo - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We consider the best-of-both-worlds problem for learning an episodic Markov Decision
Process through $ T $ episodes, with the goal of achieving $\widetilde {\mathcal {O}}(\sqrt …

Simultaneously learning stochastic and adversarial episodic mdps with known transition

T Jin, H Luo - Advances in neural information processing …, 2020 - proceedings.neurips.cc
This work studies the problem of learning episodic Markov Decision Processes with known
transition and bandit feedback. We develop the first algorithm with a``best-of-both …

Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

T Tsuchiya, S Ito, J Honda - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Adaptivity to the difficulties of a problem is a key property in sequential decision-making
problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) …

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and mdps

CW Lee, H Luo, CY Wei… - Advances in neural …, 2020 - proceedings.neurips.cc
We develop a new approach to obtaining high probability regret bounds for online learning
with bandit feedback against an adaptive adversary. While existing approaches all require …

Towards best-of-all-worlds online learning with feedback graphs

L Erez, T Koren - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We study the online learning with feedback graphs framework introduced by Mannor and
Shamir (2011), in which the feedback received by the online learner is specified by a graph …

Minimax regret for stochastic shortest path with adversarial costs and known transition

L Chen, H Luo, CY Wei - Conference on Learning Theory, 2021 - proceedings.mlr.press
We study the stochastic shortest path problem with adversarial costs and known transition,
and show that the minimax regret is $ O (\sqrt {DT_\star K}) $ and $ O (\sqrt {DT_\star SA K}) …

Knowledge-aware conversational preference elicitation with bandit feedback

C Zhao, T Yu, Z Xie, S Li - Proceedings of the ACM Web Conference …, 2022 - dl.acm.org
Conversational recommender systems (CRSs) have been proposed recently to mitigate the
cold-start problem suffered by the traditional recommender systems. By introducing …

Corralling a larger band of bandits: A case study on switching regret for linear bandits

H Luo, M Zhang, P Zhao… - Conference on Learning …, 2022 - proceedings.mlr.press
We consider the problem of combining and learning over a set of adversarial bandit
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …