Learning policies with zero or bounded constraint violation for constrained mdps
We address the issue of safety in reinforcement learning. We pose the problem in an
episodic framework of a constrained Markov decision process. Existing results have shown …
episodic framework of a constrained Markov decision process. Existing results have shown …
Stochastic bandits with linear constraints
A Pacchiano, M Ghavamzadeh… - International …, 2021 - proceedings.mlr.press
We study a constrained contextual linear bandit setting, where the goal of the agent is to
produce a sequence of policies, whose expected cumulative reward over the course of …
produce a sequence of policies, whose expected cumulative reward over the course of …
Safe reinforcement learning with linear function approximation
S Amani, C Thrampoulidis… - … Conference on Machine …, 2021 - proceedings.mlr.press
Safety in reinforcement learning has become increasingly important in recent years. Yet,
existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to …
existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to …
On kernelized multi-armed bandits with constraints
We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …
general unknown constraint function. Both functions can be non-linear (even non-convex) …
Learning to persuade on the fly: Robustness against ignorance
We study a repeated persuasion setting between a sender and a receiver, where at each
time t, the sender shares information about a payoff-relevant state with the receiver. The …
time t, the sender shares information about a payoff-relevant state with the receiver. The …
An accurate non-accelerometer-based ppg motion artifact removal technique using cyclegan
AH Afandizadeh Zargari, SAH Aqajari… - ACM Transactions on …, 2023 - dl.acm.org
A photoplethysmography (PPG) is an uncomplicated and inexpensive optical technique
widely used in the healthcare domain to extract valuable health-related information, eg …
widely used in the healthcare domain to extract valuable health-related information, eg …
An efficient pessimistic-optimistic algorithm for stochastic linear bandits with general constraints
This paper considers stochastic linear bandits with general nonlinear constraints. The
objective is to maximize the expected cumulative reward over horizon $ T $ subject to a set …
objective is to maximize the expected cumulative reward over horizon $ T $ subject to a set …
Best arm identification with safety constraints
Z Wang, AJ Wagenmaker… - … Conference on Artificial …, 2022 - proceedings.mlr.press
The best arm identification problem in the multi-armed bandit setting is an excellent model of
many real-world decision-making problems, yet it fails to capture the fact that in the real …
many real-world decision-making problems, yet it fails to capture the fact that in the real …
Directional optimism for safe linear bandits
S Hutchinson, B Turan… - … Conference on Artificial …, 2024 - proceedings.mlr.press
The safe linear bandit problem is a version of the classical stochastic linear bandit problem
where the learner's actions must satisfy an uncertain constraint at all rounds. Due its …
where the learner's actions must satisfy an uncertain constraint at all rounds. Due its …
Pure exploration in bandits with linear constraints
We address the problem of identifying the optimal policy with a fixed confidence level in a
multi-armed bandit setup, when\emph {the arms are subject to linear constraints}. Unlike the …
multi-armed bandit setup, when\emph {the arms are subject to linear constraints}. Unlike the …