Contextual bandits with packing and covering constraints: A modular lagrangian approach via regression

A Slivkins, KA Sankararaman… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual
bandits in which the algorithm consumes multiple resources subject to linear constraints on …

Rectified pessimistic-optimistic learning for stochastic continuum-armed bandit with constraints

H Guo, Z Qi, X Liu - Learning for Dynamics and Control …, 2023 - proceedings.mlr.press
This paper studies the problem of stochastic continuum-armed bandit with constraints
(SCBwC), where we optimize a black-box reward function $ f (x) $ subject to a black-box …

Bandits with knapsacks: advice on time-varying demands

L Lyu, WC Cheung - International Conference on Machine …, 2023 - proceedings.mlr.press
We consider a non-stationary Bandits with Knapsack problem. The outcome distribution at
each time is scaled by a non-stationary quantity that signifies changing demand volumes …

Approximately stationary bandits with knapsacks

G Fikioris, É Tardos - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
Abstract Bandits with Knapsacks (BwK), the generalization of the Multi-Armed Bandits
problem under global budget constraints, has received a lot of attention in recent years. It …

Optimal arms identification with knapsacks

S Li, L Zhang, Y Yu, X Li - International Conference on …, 2023 - proceedings.mlr.press
Abstract Best Arm Identification (BAI) is a general online pure exploration framework to
identify optimal decisions among candidates via sequential interactions. We pioneer the …

On dynamic pricing with covariates

H Wang, K Talluri, X Li - arXiv preprint arXiv:2112.13254, 2021 - arxiv.org
We consider dynamic pricing with covariates under a generalized linear demand model: a
seller can dynamically adjust the price of a product over a horizon of $ T $ time periods, and …

Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions

P Hu, J Jiang, G Lyu, H Su - arXiv preprint arXiv:2401.01077, 2024 - arxiv.org
We consider an online two-stage stochastic optimization with long-term constraints over a
finite horizon of $ T $ periods. At each period, we take the first-stage action, observe a model …

Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration

H Wei, X Liu, L Ying - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
This paper studies safe Reinforcement Learning (safe RL) with linear function approximation
and under hard instantaneous constraints where unsafe actions must be avoided at each …

High-dimensional Linear Bandits with Knapsacks

W Ma, D Xia, J Jiang - arXiv preprint arXiv:2311.01327, 2023 - arxiv.org
We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional
setting where the dimension of the feature is large. The reward of pulling each arm equals …

Towards Better Statistical Understanding of Watermarking LLMs

Z Cai, S Liu, H Wang, H Zhong, X Li - arXiv preprint arXiv:2403.13027, 2024 - arxiv.org
In this paper, we study the problem of watermarking large language models (LLMs). We
consider the trade-off between model distortion and detection ability and formulate it as a …