Non-stochastic Bandits With Evolving Observations

Y Bar-On, Y Mansour - arXiv preprint arXiv:2405.16843, 2024 - arxiv.org
We introduce a novel online learning framework that unifies and generalizes pre-
established models, such as delayed and corrupted feedback, to encompass adversarial …

Regret Guarantees for Adversarial Contextual Bandits with Delayed Feedback

L Erez, O Levy, Y Mansour - Seventeenth European Workshop on … - openreview.net
In this paper we present regret minimization algorithms for the contextual multi-armed bandit
(CMAB) problem in the presence of delayed feedback, a scenario where reward …