Thompson sampling algorithms for mean-variance bandits

Q Zhu, V Tan - International Conference on Machine …, 2020 - proceedings.mlr.press
The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the
exploration-exploitation tradeoff. However, standard formulations do not take into account …

[PDF][PDF] Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

LA Prashanth, K Jagannathan… - Proceedings of the 37th …, 2020 - proceedings.mlr.press
Abstract Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such
as finance. We derive concentration bounds for CVaR estimates, considering separately the …

[PDF][PDF] Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards.

A Kagrecha, J Nair, KP Jagannathan - NeurIPS, 2019 - proceedings.neurips.cc
Classical multi-armed bandit problems use the expected value of an arm as a metric to
evaluate its goodness. However, the expected value is a risk-neutral metric. In many …

Safe linear stochastic bandits

K Khezeli, E Bitar - Proceedings of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org
We introduce the safe linear stochastic bandit framework—a generalization of linear
stochastic bandits—where, in each stage, the learner is required to select an arm with an …

A revised approach for risk-averse multi-armed bandits under cvar criterion

N Khajonchotpanya, Y Xue… - Operations Research …, 2021 - Elsevier
We study multi-armed bandit problems that use conditional value-at-risk as an underlying
risk measure. In particular, we propose a new upper confidence bound algorithm and …

Quantile bandits for best arms identification

M Zhang, CS Ong - International conference on machine …, 2021 - proceedings.mlr.press
We consider a variant of the best arm identification task in stochastic multi-armed bandits.
Motivated by risk-averse decision-making problems, our goal is to identify a set of $ m …

Risk-aware multi-armed bandits with refined upper confidence bounds

X Liu, M Derakhshani, S Lambotharan… - IEEE Signal …, 2020 - ieeexplore.ieee.org
The classical multi-armed bandit (MAB) framework studies the exploration-exploitation
dilemma of the decisionmaking problem and always treats the arm with the highest expected …

Almost optimal variance-constrained best arm identification

Y Hou, VYF Tan, Z Zhong - IEEE Transactions on Information …, 2022 - ieeexplore.ieee.org
We design and analyze Variance-Aware-Lower and Upper Confidence Bound (VA-LUCB), a
parameter-free algorithm, for identifying the best arm under the fixed-confidence setup and …

A survey of risk-aware multi-armed bandits

VYF Tan, K Jagannathan - arXiv preprint arXiv:2205.05843, 2022 - arxiv.org
In several applications such as clinical trials and financial portfolio optimization, the
expected value (or the average reward) does not satisfactorily capture the merits of a drug or …

Probably anytime-safe stochastic combinatorial semi-bandits

Y Hou, VYF Tan, Z Zhong - International Conference on …, 2023 - proceedings.mlr.press
Motivated by concerns about making online decisions that incur undue amount of risk at
each time step, in this paper, we formulate the probably anytime-safe stochastic …