Follow-the-perturbed-leader for adversarial markov decision processes with bandit feedback
We consider regret minimization for Adversarial Markov Decision Processes (AMDPs),
where the loss functions are changing over time and adversarially chosen, and the learner …
where the loss functions are changing over time and adversarially chosen, and the learner …
Follow-the-Perturbed-Leader for Adversarial Bandits: Heavy Tails, Robustness, and Privacy
We study adversarial bandit problems with potentially heavy-tailed losses. Unlike standard
settings with non-negative and bounded losses, managing negative and unbounded losses …
settings with non-negative and bounded losses, managing negative and unbounded losses …