Contextual bandits with large action spaces: Made practical
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …
and computationally efficient, yet support the use of flexible, general-purpose models …
High-dimensional experimental design and kernel bandits
R Camilleri, K Jamieson… - … on Machine Learning, 2021 - proceedings.mlr.press
In recent years methods from optimal linear experimental design have been leveraged to
obtain state of the art results for linear bandits. A design returned from an objective such as …
obtain state of the art results for linear bandits. A design returned from an objective such as …
Multi-task representation learning for pure exploration in bilinear bandits
We study multi-task representation learning for the problem of pure exploration in bilinear
bandits. In bilinear bandits, an action takes theform of a pair of arms from two different entity …
bandits. In bilinear bandits, an action takes theform of a pair of arms from two different entity …
Instance-optimality in interactive decision making: Toward a non-asymptotic theory
AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
Improved variance-aware confidence sets for linear bandits and linear mixture mdp
This paper presents new\emph {variance-aware} confidence sets for linear bandits and
linear mixture Markov Decision Processes (MDPs). With the new confidence sets, we obtain …
linear mixture Markov Decision Processes (MDPs). With the new confidence sets, we obtain …
Active learning with safety constraints
R Camilleri, A Wagenmaker… - Advances in …, 2022 - proceedings.neurips.cc
Active learning methods have shown great promise in reducing the number of samples
necessary for learning. As automated learning systems are adopted into real-time, real …
necessary for learning. As automated learning systems are adopted into real-time, real …
Experimental designs for heteroskedastic variance
Most linear experimental design problems assume homogeneous variance, while the
presence of heteroskedastic noise is present in many realistic settings. Let a learner have …
presence of heteroskedastic noise is present in many realistic settings. Let a learner have …
Non-asymptotic analysis of a ucb-based top two algorithm
A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …
Improved confidence bounds for the linear logistic model and applications to bandits
We propose improved fixed-design confidence bounds for the linear logistic model. Our
bounds significantly improve upon the state-of-the-art bound by Li et al.(2017) via recent …
bounds significantly improve upon the state-of-the-art bound by Li et al.(2017) via recent …
[PDF][PDF] Variance-aware confidence set: Variance-dependent bound for linear bandits and horizon-free bound for linear mixture mdp
arXiv:2101.12745v2 [cs.LG] 19 Feb 2021 Page 1 arXiv:2101.12745v2 [cs.LG] 19 Feb 2021
Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and …
Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and …