Follow-the-perturbed-leader for adversarial markov decision processes with bandit feedback

Y Dai, H Luo, L Chen - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We consider regret minimization for Adversarial Markov Decision Processes (AMDPs),
where the loss functions are changing over time and adversarially chosen, and the learner …

Follow-the-Perturbed-Leader for Adversarial Bandits: Heavy Tails, Robustness, and Privacy

D Cheng, X Zhou, B Ji - openreview.net
We study adversarial bandit problems with potentially heavy-tailed losses. Unlike standard
settings with non-negative and bounded losses, managing negative and unbounded losses …