Multinomial logit bandit with low switching cost

Frontiers in Service Science: Data-Driven Revenue Management: The Interplay of Data, Model, and Decisions

N Chen, M Hu - Service Science, 2023 - pubsonline.informs.org

Revenue management (RM) is the application of analytical methodologies and tools that
predict consumer behavior and optimize product availability and prices to maximize a firm's …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

Linear bandits with limited adaptivity and learning distributional optimal design

Y Ruan, J Yang, Y Zhou - Proceedings of the 53rd Annual ACM SIGACT …, 2021 - dl.acm.org

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity
constraints to linear contextual bandits, a central problem in online learning and decision …

被引用次数：59 相关文章所有 6 个版本

[PDF] neurips.cc

Near-optimal regret bounds for multi-batch reinforcement learning

Z Zhang, Y Jiang, Y Zhou, X Ji - Advances in Neural …, 2022 - proceedings.neurips.cc

In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …

被引用次数：11 相关文章所有 9 个版本

[PDF] mlr.press

Towards scalable and robust structured bandits: A meta-learning framework

R Wan, L Ge, R Song - International Conference on Artificial …, 2023 - proceedings.mlr.press

Online learning in large-scale structured bandits is known to be challenging due to the curse
of dimensionality. In this paper, we propose a unified meta-learning framework for a wide …

被引用次数：13 相关文章所有 3 个版本

[PDF] neurips.cc

Phase transitions and cyclic phenomena in bandits with switching constraints

D Simchi-Levi, Y Xu - Advances in Neural Information …, 2019 - proceedings.neurips.cc

We consider the classical stochastic multi-armed bandit problem with a constraint on the
total cost incurred by switching between actions. Under the unit switching cost structure …

被引用次数：38 相关文章所有 13 个版本

[PDF] mlr.press

Conservative exploration in reinforcement learning

E Garcelon, M Ghavamzadeh… - International …, 2020 - proceedings.mlr.press

While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …

被引用次数：29 相关文章所有 11 个版本

[PDF] neurips.cc

Reinforcement learning with logarithmic regret and policy switches

G Velegkas, Z Yang, A Karbasi - Advances in Neural …, 2022 - proceedings.neurips.cc

In this paper, we study the problem of regret minimization for episodic Reinforcement
Learning (RL) both in the model-free and the model-based setting. We focus on learning …

被引用次数：2 相关文章所有 4 个版本

[PDF] neurips.cc

Ucb-based algorithms for multinomial logistic regression bandits

S Amani, C Thrampoulidis - Advances in Neural …, 2021 - proceedings.neurips.cc

Out of the rich family of generalized linear bandits, perhaps the most well studied ones are
logistic bandits that are used in problems with binary rewards: for instance, when the learner …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Contextual Multinomial Logit Bandits with General Value Functions

M Zhang, H Luo - arXiv preprint arXiv:2402.08126, 2024 - arxiv.org

Contextual multinomial logit (MNL) bandits capture many real-world assortment
recommendation problems such as online retailing/advertising. However, prior work has …

被引用次数：1 相关文章所有 2 个版本

[PDF] neurips.cc

Online convex optimization with continuous switching constraint

G Wang, Y Wan, T Yang… - Advances in Neural …, 2021 - proceedings.neurips.cc

In many sequential decision making applications, the change of decision would bring an
additional cost, such as the wear-and-tear cost associated with changing server status. To …

被引用次数：10 相关文章所有 12 个版本