Adaptive best-of-both-worlds algorithm for heavy-tailed multi-armed bandits
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial
environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi …
environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi …
Banker online mirror descent: A universal approach for delayed online bandit learning
Abstract We propose Banker Online Mirror Descent (Banker-OMD), a novel framework
generalizing the classical Online Mirror Descent (OMD) technique in the online learning …
generalizing the classical Online Mirror Descent (OMD) technique in the online learning …
Scale-free adversarial multi armed bandits
Abstract We consider the Scale-Free Adversarial Multi Armed Bandits (MAB) problem. At the
beginning of the game, the player only knows the number of arms $ n $. It does not know the …
beginning of the game, the player only knows the number of arms $ n $. It does not know the …
BANDITQ: Fair Bandits with Guaranteed Rewards
A Sinha - The 40th Conference on Uncertainty in Artificial …, 2024 - openreview.net
Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound
(UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their …
(UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their …
Improved Algorithms for Adversarial Bandits with Unbounded Losses
We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses,
where the algorithms have no prior knowledge on the sizes of the losses. We present UMAB …
where the algorithms have no prior knowledge on the sizes of the losses. We present UMAB …
From optimality to robustness: Adaptive re-sampling strategies in stochastic bandits
The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …
assumptions on the arm's distribution (eg bounded with known support, exponential family …
-Adaptive Regret Minimization in Heavy-Tailed Bandits
Heavy-tailed distributions naturally arise in several settings, from finance to
telecommunications. While regret minimization under subgaussian or bounded rewards has …
telecommunications. While regret minimization under subgaussian or bounded rewards has …
[PDF][PDF] From optimality to robustness: Dirichlet sampling strategies in stochastic bandits
The stochastic multi-arm bandit problem has been extensively studied under standard
assumptions on the arm's distribution (eg bounded with known support, exponential family …
assumptions on the arm's distribution (eg bounded with known support, exponential family …
Scale-free Adversarial Reinforcement Learning
This paper initiates the study of scale-free learning in Markov Decision Processes (MDPs),
where the scale of rewards/losses is unknown to the learner. We design a generic …
where the scale of rewards/losses is unknown to the learner. We design a generic …
FMICA: Future Mobility and Imminent Computation-Aware Task Offloading in Vehicular Fog Environment
Vehicular fog computing (VFC) is a technology that enhances vehicular applications by
offloading the task of the resource-restricted vehicle to the resourceful fog node (vehicle or …
offloading the task of the resource-restricted vehicle to the resourceful fog node (vehicle or …