[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Exploration-exploitation in constrained mdps

Y Efroni, S Mannor, M Pirotta - arXiv preprint arXiv:2003.02189, 2020 - arxiv.org
In many sequential decision-making problems, the goal is to optimize a utility function while
satisfying a set of constraints on different utilities. This learning problem is formalized …

Learning policies with zero or bounded constraint violation for constrained mdps

T Liu, R Zhou, D Kalathil, P Kumar… - Advances in Neural …, 2021 - proceedings.neurips.cc
We address the issue of safety in reinforcement learning. We pose the problem in an
episodic framework of a constrained Markov decision process. Existing results have shown …

Mostly exploration-free algorithms for contextual bandits

H Bastani, M Bayati, K Khosravi - Management Science, 2021 - pubsonline.informs.org
The contextual bandit literature has traditionally focused on algorithms that address the
exploration–exploitation tradeoff. In particular, greedy algorithms that exploit current …

Linear stochastic bandits under safety constraints

S Amani, M Alizadeh… - Advances in Neural …, 2019 - proceedings.neurips.cc
Bandit algorithms have various application in safety-critical systems, where it is important to
respect the system constraints that rely on the bandit's unknown parameters at every round …

Stochastic bandits with linear constraints

A Pacchiano, M Ghavamzadeh… - International …, 2021 - proceedings.mlr.press
We study a constrained contextual linear bandit setting, where the goal of the agent is to
produce a sequence of policies, whose expected cumulative reward over the course of …

Regret minimization with performative feedback

M Jagadeesan, T Zrnic… - … on Machine Learning, 2022 - proceedings.mlr.press
In performative prediction, the deployment of a predictive model triggers a shift in the data
distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy …

On kernelized multi-armed bandits with constraints

X Zhou, B Ji - Advances in neural information processing …, 2022 - proceedings.neurips.cc
We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …

An efficient pessimistic-optimistic algorithm for stochastic linear bandits with general constraints

X Liu, B Li, P Shi, L Ying - Advances in Neural Information …, 2021 - proceedings.neurips.cc
This paper considers stochastic linear bandits with general nonlinear constraints. The
objective is to maximize the expected cumulative reward over horizon $ T $ subject to a set …

Offline contextual bandits with high probability fairness guarantees

B Metevier, S Giguere, S Brockman… - Advances in neural …, 2019 - proceedings.neurips.cc
We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad
family of fairness constraints. Our algorithm accepts multiple fairness definitions and allows …