Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

High-dimensional experimental design and kernel bandits

R Camilleri, K Jamieson… - … on Machine Learning, 2021 - proceedings.mlr.press
In recent years methods from optimal linear experimental design have been leveraged to
obtain state of the art results for linear bandits. A design returned from an objective such as …

Multi-task representation learning for pure exploration in bilinear bandits

S Mukherjee, Q Xie, J Hanna… - Advances in Neural …, 2024 - proceedings.neurips.cc
We study multi-task representation learning for the problem of pure exploration in bilinear
bandits. In bilinear bandits, an action takes theform of a pair of arms from two different entity …

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

Improved variance-aware confidence sets for linear bandits and linear mixture mdp

Z Zhang, J Yang, X Ji, SS Du - Advances in Neural …, 2021 - proceedings.neurips.cc
This paper presents new\emph {variance-aware} confidence sets for linear bandits and
linear mixture Markov Decision Processes (MDPs). With the new confidence sets, we obtain …

Active learning with safety constraints

R Camilleri, A Wagenmaker… - Advances in …, 2022 - proceedings.neurips.cc
Active learning methods have shown great promise in reducing the number of samples
necessary for learning. As automated learning systems are adopted into real-time, real …

Experimental designs for heteroskedastic variance

J Weltz, T Fiez, A Volfovsky, E Laber… - Advances in …, 2024 - proceedings.neurips.cc
Most linear experimental design problems assume homogeneous variance, while the
presence of heteroskedastic noise is present in many realistic settings. Let a learner have …

Non-asymptotic analysis of a ucb-based top two algorithm

M Jourdan, R Degenne - Advances in Neural Information …, 2024 - proceedings.neurips.cc
A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …

Improved confidence bounds for the linear logistic model and applications to bandits

KS Jun, L Jain, B Mason… - … Conference on Machine …, 2021 - proceedings.mlr.press
We propose improved fixed-design confidence bounds for the linear logistic model. Our
bounds significantly improve upon the state-of-the-art bound by Li et al.(2017) via recent …

[PDF][PDF] Variance-aware confidence set: Variance-dependent bound for linear bandits and horizon-free bound for linear mixture mdp

Z Zhang, J Yang, X Ji, SS Du - arXiv preprint arXiv:2101.12745, 2021 - researchgate.net
arXiv:2101.12745v2 [cs.LG] 19 Feb 2021 Page 1 arXiv:2101.12745v2 [cs.LG] 19 Feb 2021
Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and …