Thompson sampling algorithms for mean-variance bandits
The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the
exploration-exploitation tradeoff. However, standard formulations do not take into account …
exploration-exploitation tradeoff. However, standard formulations do not take into account …
[PDF][PDF] Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions
LA Prashanth, K Jagannathan… - Proceedings of the 37th …, 2020 - proceedings.mlr.press
Abstract Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such
as finance. We derive concentration bounds for CVaR estimates, considering separately the …
as finance. We derive concentration bounds for CVaR estimates, considering separately the …
[PDF][PDF] Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards.
Classical multi-armed bandit problems use the expected value of an arm as a metric to
evaluate its goodness. However, the expected value is a risk-neutral metric. In many …
evaluate its goodness. However, the expected value is a risk-neutral metric. In many …
Safe linear stochastic bandits
K Khezeli, E Bitar - Proceedings of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org
We introduce the safe linear stochastic bandit framework—a generalization of linear
stochastic bandits—where, in each stage, the learner is required to select an arm with an …
stochastic bandits—where, in each stage, the learner is required to select an arm with an …
A revised approach for risk-averse multi-armed bandits under cvar criterion
N Khajonchotpanya, Y Xue… - Operations Research …, 2021 - Elsevier
We study multi-armed bandit problems that use conditional value-at-risk as an underlying
risk measure. In particular, we propose a new upper confidence bound algorithm and …
risk measure. In particular, we propose a new upper confidence bound algorithm and …
Quantile bandits for best arms identification
We consider a variant of the best arm identification task in stochastic multi-armed bandits.
Motivated by risk-averse decision-making problems, our goal is to identify a set of $ m …
Motivated by risk-averse decision-making problems, our goal is to identify a set of $ m …
Risk-aware multi-armed bandits with refined upper confidence bounds
X Liu, M Derakhshani, S Lambotharan… - IEEE Signal …, 2020 - ieeexplore.ieee.org
The classical multi-armed bandit (MAB) framework studies the exploration-exploitation
dilemma of the decisionmaking problem and always treats the arm with the highest expected …
dilemma of the decisionmaking problem and always treats the arm with the highest expected …
Almost optimal variance-constrained best arm identification
Y Hou, VYF Tan, Z Zhong - IEEE Transactions on Information …, 2022 - ieeexplore.ieee.org
We design and analyze Variance-Aware-Lower and Upper Confidence Bound (VA-LUCB), a
parameter-free algorithm, for identifying the best arm under the fixed-confidence setup and …
parameter-free algorithm, for identifying the best arm under the fixed-confidence setup and …
A survey of risk-aware multi-armed bandits
VYF Tan, K Jagannathan - arXiv preprint arXiv:2205.05843, 2022 - arxiv.org
In several applications such as clinical trials and financial portfolio optimization, the
expected value (or the average reward) does not satisfactorily capture the merits of a drug or …
expected value (or the average reward) does not satisfactorily capture the merits of a drug or …
Probably anytime-safe stochastic combinatorial semi-bandits
Y Hou, VYF Tan, Z Zhong - International Conference on …, 2023 - proceedings.mlr.press
Motivated by concerns about making online decisions that incur undue amount of risk at
each time step, in this paper, we formulate the probably anytime-safe stochastic …
each time step, in this paper, we formulate the probably anytime-safe stochastic …