Frontiers in Service Science: Data-Driven Revenue Management: The Interplay of Data, Model, and Decisions

N Chen, M Hu - Service Science, 2023 - pubsonline.informs.org
Revenue management (RM) is the application of analytical methodologies and tools that
predict consumer behavior and optimize product availability and prices to maximize a firm's …

Linear bandits with limited adaptivity and learning distributional optimal design

Y Ruan, J Yang, Y Zhou - Proceedings of the 53rd Annual ACM SIGACT …, 2021 - dl.acm.org
Motivated by practical needs such as large-scale learning, we study the impact of adaptivity
constraints to linear contextual bandits, a central problem in online learning and decision …

Near-optimal regret bounds for multi-batch reinforcement learning

Z Zhang, Y Jiang, Y Zhou, X Ji - Advances in Neural …, 2022 - proceedings.neurips.cc
In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …

Towards scalable and robust structured bandits: A meta-learning framework

R Wan, L Ge, R Song - International Conference on Artificial …, 2023 - proceedings.mlr.press
Online learning in large-scale structured bandits is known to be challenging due to the curse
of dimensionality. In this paper, we propose a unified meta-learning framework for a wide …

Phase transitions and cyclic phenomena in bandits with switching constraints

D Simchi-Levi, Y Xu - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We consider the classical stochastic multi-armed bandit problem with a constraint on the
total cost incurred by switching between actions. Under the unit switching cost structure …

Conservative exploration in reinforcement learning

E Garcelon, M Ghavamzadeh… - International …, 2020 - proceedings.mlr.press
While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …

Reinforcement learning with logarithmic regret and policy switches

G Velegkas, Z Yang, A Karbasi - Advances in Neural …, 2022 - proceedings.neurips.cc
In this paper, we study the problem of regret minimization for episodic Reinforcement
Learning (RL) both in the model-free and the model-based setting. We focus on learning …

Ucb-based algorithms for multinomial logistic regression bandits

S Amani, C Thrampoulidis - Advances in Neural …, 2021 - proceedings.neurips.cc
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are
logistic bandits that are used in problems with binary rewards: for instance, when the learner …

Contextual Multinomial Logit Bandits with General Value Functions

M Zhang, H Luo - arXiv preprint arXiv:2402.08126, 2024 - arxiv.org
Contextual multinomial logit (MNL) bandits capture many real-world assortment
recommendation problems such as online retailing/advertising. However, prior work has …

Online convex optimization with continuous switching constraint

G Wang, Y Wan, T Yang… - Advances in Neural …, 2021 - proceedings.neurips.cc
In many sequential decision making applications, the change of decision would bring an
additional cost, such as the wear-and-tear cost associated with changing server status. To …