An empirical process approach to the union bound: Practical algorithms for combinatorial...

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press

A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

被引用次数：29 相关文章所有 3 个版本

[PDF] mlr.press

High-dimensional experimental design and kernel bandits

R Camilleri, K Jamieson… - … on Machine Learning, 2021 - proceedings.mlr.press

In recent years methods from optimal linear experimental design have been leveraged to
obtain state of the art results for linear bandits. A design returned from an objective such as …

被引用次数：49 相关文章所有 5 个版本

[PDF] neurips.cc

Multi-task representation learning for pure exploration in bilinear bandits

S Mukherjee, Q Xie, J Hanna… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study multi-task representation learning for the problem of pure exploration in bilinear
bandits. In bilinear bandits, an action takes theform of a pair of arms from two different entity …

被引用次数：4 相关文章所有 7 个版本

[PDF] mlr.press

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

被引用次数：14 相关文章所有 3 个版本

[PDF] neurips.cc

Improved variance-aware confidence sets for linear bandits and linear mixture mdp

Z Zhang, J Yang, X Ji, SS Du - Advances in Neural …, 2021 - proceedings.neurips.cc

This paper presents new\emph {variance-aware} confidence sets for linear bandits and
linear mixture Markov Decision Processes (MDPs). With the new confidence sets, we obtain …

被引用次数：35 相关文章所有 7 个版本

[PDF] neurips.cc

Active learning with safety constraints

R Camilleri, A Wagenmaker… - Advances in …, 2022 - proceedings.neurips.cc

Active learning methods have shown great promise in reducing the number of samples
necessary for learning. As automated learning systems are adopted into real-time, real …

被引用次数：17 相关文章所有 6 个版本

[PDF] neurips.cc

Experimental designs for heteroskedastic variance

J Weltz, T Fiez, A Volfovsky, E Laber… - Advances in …, 2024 - proceedings.neurips.cc

Most linear experimental design problems assume homogeneous variance, while the
presence of heteroskedastic noise is present in many realistic settings. Let a learner have …

被引用次数：2 相关文章所有 9 个版本

[PDF] neurips.cc

Non-asymptotic analysis of a ucb-based top two algorithm

M Jourdan, R Degenne - Advances in Neural Information …, 2024 - proceedings.neurips.cc

A Top Two sampling rule for bandit identification is a method which selects the next arm to
sample from among two candidate arms, a leader and a challenger. Due to their simplicity …

被引用次数：7 相关文章所有 9 个版本

[PDF] mlr.press

Improved confidence bounds for the linear logistic model and applications to bandits

KS Jun, L Jain, B Mason… - … Conference on Machine …, 2021 - proceedings.mlr.press

We propose improved fixed-design confidence bounds for the linear logistic model. Our
bounds significantly improve upon the state-of-the-art bound by Li et al.(2017) via recent …

被引用次数：20 相关文章所有 3 个版本

[PDF] researchgate.net

[PDF][PDF] Variance-aware confidence set: Variance-dependent bound for linear bandits and horizon-free bound for linear mixture mdp

Z Zhang, J Yang, X Ji, SS Du - arXiv preprint arXiv:2101.12745, 2021 - researchgate.net

arXiv:2101.12745v2 [cs.LG] 19 Feb 2021 Page 1 arXiv:2101.12745v2 [cs.LG] 19 Feb 2021
Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and …

被引用次数：29 相关文章