Non-stochastic Bandits With Evolving Observations
We introduce a novel online learning framework that unifies and generalizes pre-
established models, such as delayed and corrupted feedback, to encompass adversarial …
established models, such as delayed and corrupted feedback, to encompass adversarial …
Regret Guarantees for Adversarial Contextual Bandits with Delayed Feedback
In this paper we present regret minimization algorithms for the contextual multi-armed bandit
(CMAB) problem in the presence of delayed feedback, a scenario where reward …
(CMAB) problem in the presence of delayed feedback, a scenario where reward …