Learning policies with zero or bounded constraint violation for constrained mdps

T Liu, R Zhou, D Kalathil, P Kumar… - Advances in Neural …, 2021 - proceedings.neurips.cc
We address the issue of safety in reinforcement learning. We pose the problem in an
episodic framework of a constrained Markov decision process. Existing results have shown …

Stochastic bandits with linear constraints

A Pacchiano, M Ghavamzadeh… - International …, 2021 - proceedings.mlr.press
We study a constrained contextual linear bandit setting, where the goal of the agent is to
produce a sequence of policies, whose expected cumulative reward over the course of …

Safe reinforcement learning with linear function approximation

S Amani, C Thrampoulidis… - … Conference on Machine …, 2021 - proceedings.mlr.press
Safety in reinforcement learning has become increasingly important in recent years. Yet,
existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to …

On kernelized multi-armed bandits with constraints

X Zhou, B Ji - Advances in neural information processing …, 2022 - proceedings.neurips.cc
We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …

Learning to persuade on the fly: Robustness against ignorance

Y Zu, K Iyer, H Xu - Proceedings of the 22nd ACM Conference on …, 2021 - dl.acm.org
We study a repeated persuasion setting between a sender and a receiver, where at each
time t, the sender shares information about a payoff-relevant state with the receiver. The …

An accurate non-accelerometer-based ppg motion artifact removal technique using cyclegan

AH Afandizadeh Zargari, SAH Aqajari… - ACM Transactions on …, 2023 - dl.acm.org
A photoplethysmography (PPG) is an uncomplicated and inexpensive optical technique
widely used in the healthcare domain to extract valuable health-related information, eg …

An efficient pessimistic-optimistic algorithm for stochastic linear bandits with general constraints

X Liu, B Li, P Shi, L Ying - Advances in Neural Information …, 2021 - proceedings.neurips.cc
This paper considers stochastic linear bandits with general nonlinear constraints. The
objective is to maximize the expected cumulative reward over horizon $ T $ subject to a set …

Best arm identification with safety constraints

Z Wang, AJ Wagenmaker… - … Conference on Artificial …, 2022 - proceedings.mlr.press
The best arm identification problem in the multi-armed bandit setting is an excellent model of
many real-world decision-making problems, yet it fails to capture the fact that in the real …

Directional optimism for safe linear bandits

S Hutchinson, B Turan… - … Conference on Artificial …, 2024 - proceedings.mlr.press
The safe linear bandit problem is a version of the classical stochastic linear bandit problem
where the learner's actions must satisfy an uncertain constraint at all rounds. Due its …

Pure exploration in bandits with linear constraints

E Carlsson, D Basu, F Johansson… - International …, 2024 - proceedings.mlr.press
We address the problem of identifying the optimal policy with a fixed confidence level in a
multi-armed bandit setup, when\emph {the arms are subject to linear constraints}. Unlike the …