Meta-thompson sampling
Efficient exploration in bandits is a fundamental online learning problem. We propose a
variant of Thompson sampling that learns to explore better as it interacts with bandit …
variant of Thompson sampling that learns to explore better as it interacts with bandit …
No regrets for learning the prior in bandits
Abstract We propose AdaTS, a Thompson sampling algorithm that adapts sequentially to
bandit tasks that it interacts with. The key idea in AdaTS is to adapt to an unknown task prior …
bandit tasks that it interacts with. The key idea in AdaTS is to adapt to an unknown task prior …
Differentiable meta-learning of bandit policies
C Boutilier, C Hsu, B Kveton… - Advances in …, 2020 - proceedings.neurips.cc
Exploration policies in Bayesian bandits maximize the average reward over problem
instances drawn from some distribution P. In this work, we learn such policies for an …
instances drawn from some distribution P. In this work, we learn such policies for an …
Meta-learning for simple regret minimization
We develop a meta-learning framework for simple regret minimization in bandits. In this
framework, a learning agent interacts with a sequence of bandit tasks, which are sampled iid …
framework, a learning agent interacts with a sequence of bandit tasks, which are sampled iid …
Restless and uncertain: Robust policies for restless bandits via deep multi-agent reinforcement learning
We introduce robustness in\textit {restless multi-armed bandits}(RMABs), a popular model
for constrained resource allocation among independent stochastic processes (arms). Nearly …
for constrained resource allocation among independent stochastic processes (arms). Nearly …
AExGym: Benchmarks and Environments for Adaptive Experimentation
Innovations across science and industry are evaluated using randomized trials (aka A/B
tests). While simple and robust, such static designs are inefficient or infeasible for testing …
tests). While simple and robust, such static designs are inefficient or infeasible for testing …
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches
E Che, H Namkoong - arXiv preprint arXiv:2303.11582, 2023 - arxiv.org
Standard bandit algorithms that assume continual reallocation of measurement effort are
challenging to implement due to delayed feedback and infrastructural/organizational …
challenging to implement due to delayed feedback and infrastructural/organizational …
Meta-learning bandit policies by gradient ascent
Most bandit policies are designed to either minimize regret in any problem instance, making
very few assumptions about the underlying environment, or in a Bayesian sense, assuming …
very few assumptions about the underlying environment, or in a Bayesian sense, assuming …
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits
We consider a Bayesian budgeted multi-armed bandit problem, in which each arm
consumes a different amount of resources when selected and there is a budget constraint on …
consumes a different amount of resources when selected and there is a budget constraint on …
Advertising Media and Target Audience Optimization via High-dimensional Bandits
W Ba, JM Harrison, HS Nair - arXiv preprint arXiv:2209.08403, 2022 - arxiv.org
We present a data-driven algorithm that advertisers can use to automate their digital ad-
campaigns at online publishers. The algorithm enables the advertiser to search across …
campaigns at online publishers. The algorithm enables the advertiser to search across …