Frontiers in Service Science: Data-Driven Revenue Management: The Interplay of Data, Model, and Decisions
Revenue management (RM) is the application of analytical methodologies and tools that
predict consumer behavior and optimize product availability and prices to maximize a firm's …
predict consumer behavior and optimize product availability and prices to maximize a firm's …
Linear bandits with limited adaptivity and learning distributional optimal design
Motivated by practical needs such as large-scale learning, we study the impact of adaptivity
constraints to linear contextual bandits, a central problem in online learning and decision …
constraints to linear contextual bandits, a central problem in online learning and decision …
Near-optimal regret bounds for multi-batch reinforcement learning
In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …
Towards scalable and robust structured bandits: A meta-learning framework
Online learning in large-scale structured bandits is known to be challenging due to the curse
of dimensionality. In this paper, we propose a unified meta-learning framework for a wide …
of dimensionality. In this paper, we propose a unified meta-learning framework for a wide …
Phase transitions and cyclic phenomena in bandits with switching constraints
D Simchi-Levi, Y Xu - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We consider the classical stochastic multi-armed bandit problem with a constraint on the
total cost incurred by switching between actions. Under the unit switching cost structure …
total cost incurred by switching between actions. Under the unit switching cost structure …
Conservative exploration in reinforcement learning
E Garcelon, M Ghavamzadeh… - International …, 2020 - proceedings.mlr.press
While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …
exploration to discover new information about the MDP, and exploitation of the current …
Reinforcement learning with logarithmic regret and policy switches
In this paper, we study the problem of regret minimization for episodic Reinforcement
Learning (RL) both in the model-free and the model-based setting. We focus on learning …
Learning (RL) both in the model-free and the model-based setting. We focus on learning …
Ucb-based algorithms for multinomial logistic regression bandits
S Amani, C Thrampoulidis - Advances in Neural …, 2021 - proceedings.neurips.cc
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are
logistic bandits that are used in problems with binary rewards: for instance, when the learner …
logistic bandits that are used in problems with binary rewards: for instance, when the learner …
Contextual Multinomial Logit Bandits with General Value Functions
Contextual multinomial logit (MNL) bandits capture many real-world assortment
recommendation problems such as online retailing/advertising. However, prior work has …
recommendation problems such as online retailing/advertising. However, prior work has …
Online convex optimization with continuous switching constraint
In many sequential decision making applications, the change of decision would bring an
additional cost, such as the wear-and-tear cost associated with changing server status. To …
additional cost, such as the wear-and-tear cost associated with changing server status. To …