Conservative contextual linear bandits

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3240 相关文章所有 9 个版本

[PDF] arxiv.org

Exploration-exploitation in constrained mdps

Y Efroni, S Mannor, M Pirotta - arXiv preprint arXiv:2003.02189, 2020 - arxiv.org

In many sequential decision-making problems, the goal is to optimize a utility function while
satisfying a set of constraints on different utilities. This learning problem is formalized …

被引用次数：175 相关文章所有 2 个版本

[PDF] neurips.cc

Learning policies with zero or bounded constraint violation for constrained mdps

T Liu, R Zhou, D Kalathil, P Kumar… - Advances in Neural …, 2021 - proceedings.neurips.cc

We address the issue of safety in reinforcement learning. We pose the problem in an
episodic framework of a constrained Markov decision process. Existing results have shown …

被引用次数：86 相关文章所有 8 个版本

[PDF] arxiv.org

Mostly exploration-free algorithms for contextual bandits

H Bastani, M Bayati, K Khosravi - Management Science, 2021 - pubsonline.informs.org

The contextual bandit literature has traditionally focused on algorithms that address the
exploration–exploitation tradeoff. In particular, greedy algorithms that exploit current …

被引用次数：203 相关文章所有 7 个版本

[PDF] neurips.cc

Linear stochastic bandits under safety constraints

S Amani, M Alizadeh… - Advances in Neural …, 2019 - proceedings.neurips.cc

Bandit algorithms have various application in safety-critical systems, where it is important to
respect the system constraints that rely on the bandit's unknown parameters at every round …

被引用次数：128 相关文章所有 8 个版本

[PDF] mlr.press

Stochastic bandits with linear constraints

A Pacchiano, M Ghavamzadeh… - International …, 2021 - proceedings.mlr.press

We study a constrained contextual linear bandit setting, where the goal of the agent is to
produce a sequence of policies, whose expected cumulative reward over the course of …

被引用次数：82 相关文章所有 6 个版本

[PDF] mlr.press

Regret minimization with performative feedback

M Jagadeesan, T Zrnic… - … on Machine Learning, 2022 - proceedings.mlr.press

In performative prediction, the deployment of a predictive model triggers a shift in the data
distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy …

被引用次数：41 相关文章所有 3 个版本

[PDF] neurips.cc

On kernelized multi-armed bandits with constraints

X Zhou, B Ji - Advances in neural information processing …, 2022 - proceedings.neurips.cc

We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …

被引用次数：30 相关文章所有 10 个版本

[PDF] neurips.cc

An efficient pessimistic-optimistic algorithm for stochastic linear bandits with general constraints

X Liu, B Li, P Shi, L Ying - Advances in Neural Information …, 2021 - proceedings.neurips.cc

This paper considers stochastic linear bandits with general nonlinear constraints. The
objective is to maximize the expected cumulative reward over horizon $ T $ subject to a set …

被引用次数：47 相关文章所有 11 个版本

[PDF] neurips.cc

Offline contextual bandits with high probability fairness guarantees

B Metevier, S Giguere, S Brockman… - Advances in neural …, 2019 - proceedings.neurips.cc

We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad
family of fairness constraints. Our algorithm accepts multiple fairness definitions and allows …

被引用次数：63 相关文章所有 10 个版本