Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization
We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …
dynamic regret as the performance measure, defined as the difference between cumulative …
Regret minimization and convergence to equilibria in general-sum markov games
L Erez, T Lancewicki, U Sherman… - International …, 2023 - proceedings.mlr.press
An abundance of recent impossibility results establish that regret minimization in Markov
games with adversarial opponents is both statistically and computationally intractable …
games with adversarial opponents is both statistically and computationally intractable …
The best of both worlds: stochastic and adversarial episodic mdps with unknown transition
We consider the best-of-both-worlds problem for learning an episodic Markov Decision
Process through $ T $ episodes, with the goal of achieving $\widetilde {\mathcal {O}}(\sqrt …
Process through $ T $ episodes, with the goal of achieving $\widetilde {\mathcal {O}}(\sqrt …
Simultaneously learning stochastic and adversarial episodic mdps with known transition
This work studies the problem of learning episodic Markov Decision Processes with known
transition and bandit feedback. We develop the first algorithm with a``best-of-both …
transition and bandit feedback. We develop the first algorithm with a``best-of-both …
Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds
Adaptivity to the difficulties of a problem is a key property in sequential decision-making
problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) …
problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) …
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and mdps
We develop a new approach to obtaining high probability regret bounds for online learning
with bandit feedback against an adaptive adversary. While existing approaches all require …
with bandit feedback against an adaptive adversary. While existing approaches all require …
Towards best-of-all-worlds online learning with feedback graphs
L Erez, T Koren - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We study the online learning with feedback graphs framework introduced by Mannor and
Shamir (2011), in which the feedback received by the online learner is specified by a graph …
Shamir (2011), in which the feedback received by the online learner is specified by a graph …
Minimax regret for stochastic shortest path with adversarial costs and known transition
We study the stochastic shortest path problem with adversarial costs and known transition,
and show that the minimax regret is $ O (\sqrt {DT_\star K}) $ and $ O (\sqrt {DT_\star SA K}) …
and show that the minimax regret is $ O (\sqrt {DT_\star K}) $ and $ O (\sqrt {DT_\star SA K}) …
Knowledge-aware conversational preference elicitation with bandit feedback
Conversational recommender systems (CRSs) have been proposed recently to mitigate the
cold-start problem suffered by the traditional recommender systems. By introducing …
cold-start problem suffered by the traditional recommender systems. By introducing …
Corralling a larger band of bandits: A case study on switching regret for linear bandits
We consider the problem of combining and learning over a set of adversarial bandit
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …