Adaptive best-of-both-worlds algorithm for heavy-tailed multi-armed bandits

J Huang, Y Dai, L Huang - international conference on …, 2022 - proceedings.mlr.press
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial
environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi …

Banker online mirror descent: A universal approach for delayed online bandit learning

J Huang, Y Dai, L Huang - International Conference on …, 2023 - proceedings.mlr.press
Abstract We propose Banker Online Mirror Descent (Banker-OMD), a novel framework
generalizing the classical Online Mirror Descent (OMD) technique in the online learning …

Scale-free adversarial multi armed bandits

SR Putta, S Agrawal - International Conference on …, 2022 - proceedings.mlr.press
Abstract We consider the Scale-Free Adversarial Multi Armed Bandits (MAB) problem. At the
beginning of the game, the player only knows the number of arms $ n $. It does not know the …

BANDITQ: Fair Bandits with Guaranteed Rewards

A Sinha - The 40th Conference on Uncertainty in Artificial …, 2024 - openreview.net
Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound
(UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their …

Improved Algorithms for Adversarial Bandits with Unbounded Losses

M Chen, X Zhang - arXiv preprint arXiv:2310.01756, 2023 - arxiv.org
We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses,
where the algorithms have no prior knowledge on the sizes of the losses. We present UMAB …

From optimality to robustness: Adaptive re-sampling strategies in stochastic bandits

D Baudry, P Saux, OA Maillard - Advances in Neural …, 2021 - proceedings.neurips.cc
The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …

-Adaptive Regret Minimization in Heavy-Tailed Bandits

G Genalti, L Marsigli, N Gatti… - The Thirty Seventh …, 2024 - proceedings.mlr.press
Heavy-tailed distributions naturally arise in several settings, from finance to
telecommunications. While regret minimization under subgaussian or bounded rewards has …

[PDF][PDF] From optimality to robustness: Dirichlet sampling strategies in stochastic bandits

D Baudry, P Saux, OA Maillard - NeurIPS 2021-35th …, 2021 - proceedings.neurips.cc
The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …

Scale-free Adversarial Reinforcement Learning

M Chen, X Zhang - arXiv preprint arXiv:2403.00930, 2024 - arxiv.org
This paper initiates the study of scale-free learning in Markov Decision Processes (MDPs),
where the scale of rewards/losses is unknown to the learner. We design a generic …

FMICA: Future Mobility and Imminent Computation-Aware Task Offloading in Vehicular Fog Environment

N Keshari, D Singh - Arabian Journal for Science and Engineering, 2023 - Springer
Vehicular fog computing (VFC) is a technology that enhances vehicular applications by
offloading the task of the resource-restricted vehicle to the resourceful fog node (vehicle or …