Adaptation to the range in k-armed bandits

J Huang, Y Dai, L Huang - international conference on …, 2022 - proceedings.mlr.press

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial
environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi …

被引用次数：18 相关文章所有 3 个版本

[PDF] mlr.press

Banker online mirror descent: A universal approach for delayed online bandit learning

J Huang, Y Dai, L Huang - International Conference on …, 2023 - proceedings.mlr.press

Abstract We propose Banker Online Mirror Descent (Banker-OMD), a novel framework
generalizing the classical Online Mirror Descent (OMD) technique in the online learning …

被引用次数：5 相关文章所有 6 个版本

[PDF] mlr.press

Scale-free adversarial multi armed bandits

SR Putta, S Agrawal - International Conference on …, 2022 - proceedings.mlr.press

Abstract We consider the Scale-Free Adversarial Multi Armed Bandits (MAB) problem. At the
beginning of the game, the player only knows the number of arms $ n $. It does not know the …

被引用次数：14 相关文章所有 3 个版本

[PDF] openreview.net

BANDITQ: Fair Bandits with Guaranteed Rewards

A Sinha - The 40th Conference on Uncertainty in Artificial …, 2024 - openreview.net

Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound
(UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Improved Algorithms for Adversarial Bandits with Unbounded Losses

M Chen, X Zhang - arXiv preprint arXiv:2310.01756, 2023 - arxiv.org

We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses,
where the algorithms have no prior knowledge on the sizes of the losses. We present UMAB …

被引用次数：3 相关文章所有 2 个版本

[PDF] neurips.cc

From optimality to robustness: Adaptive re-sampling strategies in stochastic bandits

D Baudry, P Saux, OA Maillard - Advances in Neural …, 2021 - proceedings.neurips.cc

The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …

被引用次数：6 相关文章所有 3 个版本

[PDF] mlr.press

-Adaptive Regret Minimization in Heavy-Tailed Bandits

G Genalti, L Marsigli, N Gatti… - The Thirty Seventh …, 2024 - proceedings.mlr.press

Heavy-tailed distributions naturally arise in several settings, from finance to
telecommunications. While regret minimization under subgaussian or bounded rewards has …

[PDF] neurips.cc

[PDF][PDF] From optimality to robustness: Dirichlet sampling strategies in stochastic bandits

D Baudry, P Saux, OA Maillard - NeurIPS 2021-35th …, 2021 - proceedings.neurips.cc

The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …

被引用次数：6 相关文章所有 14 个版本

[PDF] arxiv.org

Scale-free Adversarial Reinforcement Learning

M Chen, X Zhang - arXiv preprint arXiv:2403.00930, 2024 - arxiv.org

This paper initiates the study of scale-free learning in Markov Decision Processes (MDPs),
where the scale of rewards/losses is unknown to the learner. We design a generic …

被引用次数：1 相关文章所有 2 个版本

FMICA: Future Mobility and Imminent Computation-Aware Task Offloading in Vehicular Fog Environment

N Keshari, D Singh - Arabian Journal for Science and Engineering, 2023 - Springer

Vehicular fog computing (VFC) is a technology that enhances vehicular applications by
offloading the task of the resource-restricted vehicle to the resourceful fog node (vehicle or …