Bandits with switching costs: T 2/3 regret

C Jin, T Jin, H Luo, S Sra, T Yu - International Conference on …, 2020 - proceedings.mlr.press

We consider the task of learning in episodic finite-horizon Markov decision processes with
an unknown transition function, bandit feedback, and adversarial losses. We propose an …

被引用次数：105 相关文章所有 8 个版本

[PDF] neurips.cc

A simple and provably efficient algorithm for asynchronous federated contextual linear bandits

J He, T Wang, Y Min, Q Gu - Advances in neural information …, 2022 - proceedings.neurips.cc

We study federated contextual linear bandits, where $ M $ agents cooperate with each other
to solve a global contextual linear bandit problem with the help of a central server. We …

被引用次数：32 相关文章所有 6 个版本

[PDF] arxiv.org

Linear bandits with limited adaptivity and learning distributional optimal design

Y Ruan, J Yang, Y Zhou - Proceedings of the 53rd Annual ACM SIGACT …, 2021 - dl.acm.org

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity
constraints to linear contextual bandits, a central problem in online learning and decision …

被引用次数：59 相关文章所有 6 个版本

[PDF] neurips.cc

Provably efficient reinforcement learning with linear function approximation under adaptivity constraints

T Wang, D Zhou, Q Gu - Advances in Neural Information …, 2021 - proceedings.neurips.cc

We study reinforcement learning (RL) with linear function approximation under the adaptivity
constraint. We consider two popular limited adaptivity models: the batch learning model and …

被引用次数：48 相关文章所有 11 个版本

[PDF] jmlr.org

Non-stationary online learning with memory and non-stochastic control

P Zhao, YH Yan, YX Wang, ZH Zhou - The Journal of Machine Learning …, 2023 - dl.acm.org

We study the problem of Online Convex Optimization (OCO) with memory, which allows loss
functions to depend on past decisions and thus captures temporal effects of learning …

被引用次数：47 相关文章所有 8 个版本

Recent advances in multiarmed bandits for sequential decision making

S Agrawal - Operations Research & Management Science in …, 2019 - pubsonline.informs.org

Reinforcement learning (RL) is a very general framework for making sequential decisions
when the underlying system dynamics are a priori unknown. RL algorithms use the …

被引用次数：21 相关文章

[PDF] neurips.cc

Follow-the-perturbed-leader for adversarial markov decision processes with bandit feedback

Y Dai, H Luo, L Chen - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We consider regret minimization for Adversarial Markov Decision Processes (AMDPs),
where the loss functions are changing over time and adversarially chosen, and the learner …

被引用次数：16 相关文章所有 7 个版本

[PDF] neurips.cc

Online learning for adversaries with memory: price of past mistakes

O Anava, E Hazan, S Mannor - Advances in Neural …, 2015 - proceedings.neurips.cc

The framework of online learning with memory naturally captures learning problems with
temporal effects, and was previously studied for the experts setting. In this work we extend …

被引用次数：92 相关文章所有 6 个版本

[PDF] mlr.press

Non-stochastic multi-player multi-armed bandits: Optimal rate with collision information, sublinear without

S Bubeck, Y Li, Y Peres… - Conference on Learning …, 2020 - proceedings.mlr.press

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit
problem. The model assumes no communication and no shared randomness at all between …

被引用次数：54 相关文章所有 5 个版本

[PDF] mlr.press

Online switching control with stability and regret guarantees

Y Li, JA Preiss, N Li, Y Lin… - … for Dynamics and …, 2023 - proceedings.mlr.press

This paper considers online switching control with a finite candidate controller pool, an
unknown dynamical system, and unknown cost functions. The candidate controllers can be …

被引用次数：17 相关文章所有 7 个版本