Contextual bandits with packing and covering constraints: A modular lagrangian approach via regression
A Slivkins, KA Sankararaman… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual
bandits in which the algorithm consumes multiple resources subject to linear constraints on …
bandits in which the algorithm consumes multiple resources subject to linear constraints on …
Rectified pessimistic-optimistic learning for stochastic continuum-armed bandit with constraints
This paper studies the problem of stochastic continuum-armed bandit with constraints
(SCBwC), where we optimize a black-box reward function $ f (x) $ subject to a black-box …
(SCBwC), where we optimize a black-box reward function $ f (x) $ subject to a black-box …
Bandits with knapsacks: advice on time-varying demands
L Lyu, WC Cheung - International Conference on Machine …, 2023 - proceedings.mlr.press
We consider a non-stationary Bandits with Knapsack problem. The outcome distribution at
each time is scaled by a non-stationary quantity that signifies changing demand volumes …
each time is scaled by a non-stationary quantity that signifies changing demand volumes …
Approximately stationary bandits with knapsacks
G Fikioris, É Tardos - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
Abstract Bandits with Knapsacks (BwK), the generalization of the Multi-Armed Bandits
problem under global budget constraints, has received a lot of attention in recent years. It …
problem under global budget constraints, has received a lot of attention in recent years. It …
Optimal arms identification with knapsacks
Abstract Best Arm Identification (BAI) is a general online pure exploration framework to
identify optimal decisions among candidates via sequential interactions. We pioneer the …
identify optimal decisions among candidates via sequential interactions. We pioneer the …
On dynamic pricing with covariates
We consider dynamic pricing with covariates under a generalized linear demand model: a
seller can dynamically adjust the price of a product over a horizon of $ T $ time periods, and …
seller can dynamically adjust the price of a product over a horizon of $ T $ time periods, and …
Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions
P Hu, J Jiang, G Lyu, H Su - arXiv preprint arXiv:2401.01077, 2024 - arxiv.org
We consider an online two-stage stochastic optimization with long-term constraints over a
finite horizon of $ T $ periods. At each period, we take the first-stage action, observe a model …
finite horizon of $ T $ periods. At each period, we take the first-stage action, observe a model …
Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration
This paper studies safe Reinforcement Learning (safe RL) with linear function approximation
and under hard instantaneous constraints where unsafe actions must be avoided at each …
and under hard instantaneous constraints where unsafe actions must be avoided at each …
High-dimensional Linear Bandits with Knapsacks
We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional
setting where the dimension of the feature is large. The reward of pulling each arm equals …
setting where the dimension of the feature is large. The reward of pulling each arm equals …
Towards Better Statistical Understanding of Watermarking LLMs
In this paper, we study the problem of watermarking large language models (LLMs). We
consider the trade-off between model distortion and detection ability and formulate it as a …
consider the trade-off between model distortion and detection ability and formulate it as a …