DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning

A Bura, A HasanzadeZonuzy… - Advances in neural …, 2022 - proceedings.neurips.cc
Safe reinforcement learning is extremely challenging--not only must the agent explore an
unknown environment, it must do so while ensuring no safety constraint violations. We …

Triple-q: A model-free algorithm for constrained reinforcement learning with sublinear regret and zero constraint violation

H Wei, X Liu, L Ying - International Conference on Artificial …, 2022 - proceedings.mlr.press
This paper presents the first model-free, simulator-free reinforcement learning algorithm for
Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint …

A provably-efficient model-free algorithm for infinite-horizon average-reward constrained Markov decision processes

H Wei, X Liu, L Ying - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon
average-reward Constrained Markov Decision Processes (CMDPs). Considering a learning …

On kernelized multi-armed bandits with constraints

X Zhou, B Ji - Advances in neural information processing …, 2022 - proceedings.neurips.cc
We study a stochastic bandit problem with a general unknown reward function and a
general unknown constraint function. Both functions can be non-linear (even non-convex) …

Combinatorial bandits with linear constraints: Beyond knapsacks and fairness

Q Liu, W Xu, S Wang, Z Fang - Advances in Neural …, 2022 - proceedings.neurips.cc
This paper proposes and studies for the first time the problem of combinatorial multi-armed
bandits with linear long-term constraints. Our model generalizes and unifies several …

Learning to schedule tasks with deadline and throughput constraints

Q Liu, Z Fang - IEEE INFOCOM 2023-IEEE Conference on …, 2023 - ieeexplore.ieee.org
We consider the task scheduling scenario where the controller activates one from K task
types at each time. Each task induces a random completion time, and a reward is obtained …

Learning while scheduling in multi-server systems with unknown statistics: Maxweight with discounted ucb

Z Yang, R Srikant, L Ying - International Conference on …, 2023 - proceedings.mlr.press
Multi-server queueing systems are widely used models for job scheduling in machine
learning, wireless networks, and crowdsourcing. This paper considers a multi-server system …

Learning-based scheduling for information gathering with QoS constraints

Q Liu, W Xu, Z Fang - IEEE INFOCOM 2024-IEEE Conference …, 2024 - ieeexplore.ieee.org
The problem of scheduling packets from multiple sources over unreliable channels has
attracted much attention due to its great practicability in the Internet of things systems. Most …

Safe learning in tree-form sequential decision making: Handling hard and soft constraints

M Bernasconi, F Cacciamani… - International …, 2022 - proceedings.mlr.press
We study decision making problems in which an agent sequentially interacts with a
stochastic environment defined by means of a tree structure. The agent repeatedly faces the …

Optimization of offloading policies for accuracy-delay tradeoffs in hierarchical inference

HB Beytur, AG Aydin, G de Veciana… - IEEE INFOCOM 2024 …, 2024 - ieeexplore.ieee.org
We consider a hierarchical inference system with multiple clients connected to a server via a
shared communication resource. When necessary, clients with low-accuracy machine …