An efficient pessimistic-optimistic algorithm for stochastic linear bandits with general constraints

A Bura, A HasanzadeZonuzy… - Advances in neural …, 2022 - proceedings.neurips.cc

Safe reinforcement learning is extremely challenging--not only must the agent explore an
unknown environment, it must do so while ensuring no safety constraint violations. We …

被引用次数：39 相关文章所有 7 个版本

[PDF] mlr.press

Triple-q: A model-free algorithm for constrained reinforcement learning with sublinear regret and zero constraint violation

H Wei, X Liu, L Ying - International Conference on Artificial …, 2022 - proceedings.mlr.press

This paper presents the first model-free, simulator-free reinforcement learning algorithm for
Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint …

被引用次数：36 相关文章所有 6 个版本

[PDF] aaai.org

A provably-efficient model-free algorithm for infinite-horizon average-reward constrained Markov decision processes

H Wei, X Liu, L Ying - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon
average-reward Constrained Markov Decision Processes (CMDPs). Considering a learning …

被引用次数：31 相关文章所有 6 个版本

[PDF] neurips.cc

On kernelized multi-armed bandits with constraints

X Zhou, B Ji - Advances in neural information processing …, 2022 - proceedings.neurips.cc

We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …

被引用次数：30 相关文章所有 10 个版本

[PDF] neurips.cc

Combinatorial bandits with linear constraints: Beyond knapsacks and fairness

Q Liu, W Xu, S Wang, Z Fang - Advances in Neural …, 2022 - proceedings.neurips.cc

This paper proposes and studies for the first time the problem of combinatorial multi-armed
bandits with linear long-term constraints. Our model generalizes and unifies several …

被引用次数：19 相关文章所有 5 个版本

[PDF] openreview.net

Learning to schedule tasks with deadline and throughput constraints

Q Liu, Z Fang - IEEE INFOCOM 2023-IEEE Conference on …, 2023 - ieeexplore.ieee.org

We consider the task scheduling scenario where the controller activates one from K task
types at each time. Each task induces a random completion time, and a reward is obtained …

被引用次数：17 相关文章所有 2 个版本

[PDF] mlr.press

Learning while scheduling in multi-server systems with unknown statistics: Maxweight with discounted ucb

Z Yang, R Srikant, L Ying - International Conference on …, 2023 - proceedings.mlr.press

Multi-server queueing systems are widely used models for job scheduling in machine
learning, wireless networks, and crowdsourcing. This paper considers a multi-server system …

被引用次数：18 相关文章所有 6 个版本

[PDF] researchgate.net

Learning-based scheduling for information gathering with QoS constraints

Q Liu, W Xu, Z Fang - IEEE INFOCOM 2024-IEEE Conference …, 2024 - ieeexplore.ieee.org

The problem of scheduling packets from multiple sources over unreliable channels has
attracted much attention due to its great practicability in the Internet of things systems. Most …

被引用次数：3 相关文章

[PDF] mlr.press

Safe learning in tree-form sequential decision making: Handling hard and soft constraints

M Bernasconi, F Cacciamani… - International …, 2022 - proceedings.mlr.press

We study decision making problems in which an agent sequentially interacts with a
stochastic environment defined by means of a tree structure. The agent repeatedly faces the …

被引用次数：14 相关文章所有 3 个版本

Optimization of offloading policies for accuracy-delay tradeoffs in hierarchical inference

HB Beytur, AG Aydin, G de Veciana… - IEEE INFOCOM 2024 …, 2024 - ieeexplore.ieee.org

We consider a hierarchical inference system with multiple clients connected to a server via a
shared communication resource. When necessary, clients with low-accuracy machine …

被引用次数：4 相关文章所有 2 个版本