Mostly exploration-free algorithms for contextual bandits
The contextual bandit literature has traditionally focused on algorithms that address the
exploration–exploitation tradeoff. In particular, greedy algorithms that exploit current …
exploration–exploitation tradeoff. In particular, greedy algorithms that exploit current …
Rate-optimal bayesian simple regret in best arm identification
We consider best arm identification in the multiarmed bandit problem. Assuming certain
continuity conditions of the prior, we characterize the rate of the Bayesian simple regret …
continuity conditions of the prior, we characterize the rate of the Bayesian simple regret …
Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
We propose endogenous Bayesian risk minimization (EBRM) over policy sets as an
approach to online learning across a wide range of settings. Many real-world online learning …
approach to online learning across a wide range of settings. Many real-world online learning …
Information-Directed Sampling-Frequentist Analysis and Applications
J Kirschner - 2021 - research-collection.ethz.ch
Sequential decision-making is an iterative process between a learning agent and an
environment. We study the stochastic setting, where the learner chooses an action in each …
environment. We study the stochastic setting, where the learner chooses an action in each …
Inference of a Firm's Learning Process from Product Launches
LB Ano, V Martinez-de-Albeniz - 2023 - papers.ssrn.com
In dynamic business environments, firms must make sequential decisions that account for
changes in consumer interests. As consumer interests gradually evolve, firms need to be …
changes in consumer interests. As consumer interests gradually evolve, firms need to be …
Nfsp-plt: Solving games with a weighted nfsp-per-based method
H Li, S Qi, J Zhang, D Zhang, L Yao, X Wang, Q Li… - Electronics, 2023 - mdpi.com
Nash equilibrium strategy is a typical goal when solving two-player imperfect-information
games (IIGs). Neural fictitious self-play (NFSP) is a popular method to find the Nash …
games (IIGs). Neural fictitious self-play (NFSP) is a popular method to find the Nash …
Asymptotic Randomised Control with applications to bandits
SN Cohen, T Treetanthiploet - arXiv preprint arXiv:2010.07252, 2020 - arxiv.org
We consider a general multi-armed bandit problem with correlated (and simple contextual
and restless) elements, as a relaxed control problem. By introducing an entropy …
and restless) elements, as a relaxed control problem. By introducing an entropy …
[PDF][PDF] Correlated bandits for dynamic pricing via the arc algorithm
SN Cohen, T Treetanthiploet - arXiv preprint arXiv:2102.04263, 2021 - academia.edu
Abstract The Asymptotic Randomised Control (ARC) algorithm provides a rigorous
approximation to the optimal strategy for a wide class of Bayesian bandits, while retaining …
approximation to the optimal strategy for a wide class of Bayesian bandits, while retaining …
On adaptivity and confounding in contextual bandit experiments
C Qin, D Russo - NeurIPS 2021 Workshop on Distribution Shifts …, 2021 - openreview.net
Multi-armed bandit algorithms minimize experimentation costs required to converge on
optimal behavior. They do so by rapidly adapting experimentation effort away from poorly …
optimal behavior. They do so by rapidly adapting experimentation effort away from poorly …
Dynamic mean field programming
G Stamatescu - arXiv preprint arXiv:2206.05200, 2022 - arxiv.org
A dynamic mean field theory is developed for finite state and action Bayesian reinforcement
learning in the large state space limit. In an analogy with statistical physics, the Bellman …
learning in the large state space limit. In an analogy with statistical physics, the Bellman …