Mostly exploration-free algorithms for contextual bandits

H Bastani, M Bayati, K Khosravi - Management Science, 2021 - pubsonline.informs.org
The contextual bandit literature has traditionally focused on algorithms that address the
exploration–exploitation tradeoff. In particular, greedy algorithms that exploit current …

Rate-optimal bayesian simple regret in best arm identification

J Komiyama, K Ariu, M Kato… - Mathematics of Operations …, 2024 - pubsonline.informs.org
We consider best arm identification in the multiarmed bandit problem. Assuming certain
continuity conditions of the prior, we characterize the rate of the Bayesian simple regret …

Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization

S Jamieson, JP How, Y Girdhar - Artificial Intelligence, 2024 - Elsevier
We propose endogenous Bayesian risk minimization (EBRM) over policy sets as an
approach to online learning across a wide range of settings. Many real-world online learning …

Information-Directed Sampling-Frequentist Analysis and Applications

J Kirschner - 2021 - research-collection.ethz.ch
Sequential decision-making is an iterative process between a learning agent and an
environment. We study the stochastic setting, where the learner chooses an action in each …

Inference of a Firm's Learning Process from Product Launches

LB Ano, V Martinez-de-Albeniz - 2023 - papers.ssrn.com
In dynamic business environments, firms must make sequential decisions that account for
changes in consumer interests. As consumer interests gradually evolve, firms need to be …

Nfsp-plt: Solving games with a weighted nfsp-per-based method

H Li, S Qi, J Zhang, D Zhang, L Yao, X Wang, Q Li… - Electronics, 2023 - mdpi.com
Nash equilibrium strategy is a typical goal when solving two-player imperfect-information
games (IIGs). Neural fictitious self-play (NFSP) is a popular method to find the Nash …

Asymptotic Randomised Control with applications to bandits

SN Cohen, T Treetanthiploet - arXiv preprint arXiv:2010.07252, 2020 - arxiv.org
We consider a general multi-armed bandit problem with correlated (and simple contextual
and restless) elements, as a relaxed control problem. By introducing an entropy …

[PDF][PDF] Correlated bandits for dynamic pricing via the arc algorithm

SN Cohen, T Treetanthiploet - arXiv preprint arXiv:2102.04263, 2021 - academia.edu
Abstract The Asymptotic Randomised Control (ARC) algorithm provides a rigorous
approximation to the optimal strategy for a wide class of Bayesian bandits, while retaining …

On adaptivity and confounding in contextual bandit experiments

C Qin, D Russo - NeurIPS 2021 Workshop on Distribution Shifts …, 2021 - openreview.net
Multi-armed bandit algorithms minimize experimentation costs required to converge on
optimal behavior. They do so by rapidly adapting experimentation effort away from poorly …

Dynamic mean field programming

G Stamatescu - arXiv preprint arXiv:2206.05200, 2022 - arxiv.org
A dynamic mean field theory is developed for finite state and action Bayesian reinforcement
learning in the large state space limit. In an analogy with statistical physics, the Bellman …