PAC Bandits with Risk Constraints.

Q Zhu, V Tan - International Conference on Machine …, 2020 - proceedings.mlr.press

The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the
exploration-exploitation tradeoff. However, standard formulations do not take into account …

被引用次数：50 相关文章所有 5 个版本

[PDF] mlr.press

[PDF][PDF] Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

LA Prashanth, K Jagannathan… - Proceedings of the 37th …, 2020 - proceedings.mlr.press

Abstract Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such
as finance. We derive concentration bounds for CVaR estimates, considering separately the …

被引用次数：51 相关文章所有 5 个版本

[PDF] neurips.cc

[PDF][PDF] Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards.

A Kagrecha, J Nair, KP Jagannathan - NeurIPS, 2019 - proceedings.neurips.cc

Classical multi-armed bandit problems use the expected value of an arm as a metric to
evaluate its goodness. However, the expected value is a risk-neutral metric. In many …

被引用次数：46 相关文章所有 7 个版本

[PDF] aaai.org

Safe linear stochastic bandits

K Khezeli, E Bitar - Proceedings of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org

We introduce the safe linear stochastic bandit framework—a generalization of linear
stochastic bandits—where, in each stage, the learner is required to select an arm with an …

被引用次数：31 相关文章所有 6 个版本

A revised approach for risk-averse multi-armed bandits under cvar criterion

N Khajonchotpanya, Y Xue… - Operations Research …, 2021 - Elsevier

We study multi-armed bandit problems that use conditional value-at-risk as an underlying
risk measure. In particular, we propose a new upper confidence bound algorithm and …

被引用次数：19 相关文章所有 5 个版本

[PDF] mlr.press

Quantile bandits for best arms identification

M Zhang, CS Ong - International conference on machine …, 2021 - proceedings.mlr.press

We consider a variant of the best arm identification task in stochastic multi-armed bandits.
Motivated by risk-averse decision-making problems, our goal is to identify a set of $ m …

被引用次数：16 相关文章所有 5 个版本

[PDF] lboro.ac.uk

Risk-aware multi-armed bandits with refined upper confidence bounds

X Liu, M Derakhshani, S Lambotharan… - IEEE Signal …, 2020 - ieeexplore.ieee.org

The classical multi-armed bandit (MAB) framework studies the exploration-exploitation
dilemma of the decisionmaking problem and always treats the arm with the highest expected …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Almost optimal variance-constrained best arm identification

Y Hou, VYF Tan, Z Zhong - IEEE Transactions on Information …, 2022 - ieeexplore.ieee.org

We design and analyze Variance-Aware-Lower and Upper Confidence Bound (VA-LUCB), a
parameter-free algorithm, for identifying the best arm under the fixed-confidence setup and …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of risk-aware multi-armed bandits

VYF Tan, K Jagannathan - arXiv preprint arXiv:2205.05843, 2022 - arxiv.org

In several applications such as clinical trials and financial portfolio optimization, the
expected value (or the average reward) does not satisfactorily capture the merits of a drug or …

被引用次数：6 相关文章所有 6 个版本

[PDF] mlr.press

Probably anytime-safe stochastic combinatorial semi-bandits

Y Hou, VYF Tan, Z Zhong - International Conference on …, 2023 - proceedings.mlr.press

Motivated by concerns about making online decisions that incur undue amount of risk at
each time step, in this paper, we formulate the probably anytime-safe stochastic …

被引用次数：1 相关文章所有 6 个版本