- 学术资源搜索

Learning policies with zero or bounded constraint violation for constrained mdps

T Liu, R Zhou, D Kalathil, P Kumar… - Advances in Neural …, 2021 - proceedings.neurips.cc

We address the issue of safety in reinforcement learning. We pose the problem in an
episodic framework of a constrained Markov decision process. Existing results have shown …

被引用次数：86 相关文章所有 8 个版本

[PDF] mlr.press

Stochastic bandits with linear constraints

A Pacchiano, M Ghavamzadeh… - International …, 2021 - proceedings.mlr.press

We study a constrained contextual linear bandit setting, where the goal of the agent is to
produce a sequence of policies, whose expected cumulative reward over the course of …

被引用次数：82 相关文章所有 6 个版本

[PDF] mlr.press

Safe reinforcement learning with linear function approximation

S Amani, C Thrampoulidis… - … Conference on Machine …, 2021 - proceedings.mlr.press

Safety in reinforcement learning has become increasingly important in recent years. Yet,
existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to …

被引用次数：44 相关文章所有 5 个版本

[PDF] neurips.cc

On kernelized multi-armed bandits with constraints

X Zhou, B Ji - Advances in neural information processing …, 2022 - proceedings.neurips.cc

We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …

被引用次数：30 相关文章所有 10 个版本

[PDF] acm.org

Learning to persuade on the fly: Robustness against ignorance

Y Zu, K Iyer, H Xu - Proceedings of the 22nd ACM Conference on …, 2021 - dl.acm.org

We study a repeated persuasion setting between a sender and a receiver, where at each
time t, the sender shares information about a payoff-relevant state with the receiver. The …

被引用次数：49 相关文章所有 6 个版本

[PDF] acm.org

An accurate non-accelerometer-based ppg motion artifact removal technique using cyclegan

AH Afandizadeh Zargari, SAH Aqajari… - ACM Transactions on …, 2023 - dl.acm.org

A photoplethysmography (PPG) is an uncomplicated and inexpensive optical technique
widely used in the healthcare domain to extract valuable health-related information, eg …

被引用次数：50 相关文章所有 3 个版本

[PDF] neurips.cc

An efficient pessimistic-optimistic algorithm for stochastic linear bandits with general constraints

X Liu, B Li, P Shi, L Ying - Advances in Neural Information …, 2021 - proceedings.neurips.cc

This paper considers stochastic linear bandits with general nonlinear constraints. The
objective is to maximize the expected cumulative reward over horizon $ T $ subject to a set …

被引用次数：47 相关文章所有 11 个版本

[PDF] mlr.press

Best arm identification with safety constraints

Z Wang, AJ Wagenmaker… - … Conference on Artificial …, 2022 - proceedings.mlr.press

The best arm identification problem in the multi-armed bandit setting is an excellent model of
many real-world decision-making problems, yet it fails to capture the fact that in the real …

被引用次数：25 相关文章所有 4 个版本

[PDF] mlr.press

Directional optimism for safe linear bandits

S Hutchinson, B Turan… - … Conference on Artificial …, 2024 - proceedings.mlr.press

The safe linear bandit problem is a version of the classical stochastic linear bandit problem
where the learner's actions must satisfy an uncertain constraint at all rounds. Due its …

被引用次数：3 相关文章所有 2 个版本

[PDF] mlr.press

Pure exploration in bandits with linear constraints

E Carlsson, D Basu, F Johansson… - International …, 2024 - proceedings.mlr.press

We address the problem of identifying the optimal policy with a fixed confidence level in a
multi-armed bandit setup, when\emph {the arms are subject to linear constraints}. Unlike the …

被引用次数：10 相关文章所有 7 个版本