Adaptive exploration in linear contextual bandit

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3213 相关文章所有 9 个版本

[PDF] arxiv.org

Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective

DJ Foster, A Rakhlin, D Simchi-Levi, Y Xu - arXiv preprint arXiv …, 2020 - arxiv.org

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved
performance on" easy" problems with a gap between the best and second-best arm. Are …

被引用次数：100 相关文章所有 4 个版本

[PDF] neurips.cc

High-dimensional sparse linear bandits

B Hao, T Lattimore, M Wang - Advances in Neural …, 2020 - proceedings.neurips.cc

Stochastic linear bandits with high-dimensional sparse features are a practical model for a
variety of domains, such as personalized medicine and online advertising. We derive a …

被引用次数：66 相关文章所有 9 个版本

[PDF] mlr.press

Federated linear contextual bandits with user-level differential privacy

R Huang, H Zhang, L Melis, M Shen… - International …, 2023 - proceedings.mlr.press

This paper studies federated linear contextual bandits under the notion of user-level
differential privacy (DP). We first introduce a unified federated bandits framework that can …

被引用次数：15 相关文章所有 8 个版本

[PDF] mlr.press

Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously

CW Lee, H Luo, CY Wei, M Zhang… - … on Machine Learning, 2021 - proceedings.mlr.press

In this work, we develop linear bandit algorithms that automatically adapt to different
environments. By plugging a novel loss estimator into the optimization problem that …

被引用次数：53 相关文章所有 5 个版本

[PDF] neurips.cc

Approximate allocation matching for structural causal bandits with unobserved confounders

L Wei, MQ Elahi, M Ghasemi… - Advances in Neural …, 2024 - proceedings.neurips.cc

Structural causal bandit provides a framework for online decision-making problems when
causal information is available. It models the stochastic environment with a structural causal …

被引用次数：5 相关文章所有 5 个版本

[PDF] neurips.cc

Instance-optimal pac algorithms for contextual bandits

Z Li, L Ratliff, KG Jamieson… - Advances in Neural …, 2022 - proceedings.neurips.cc

In the stochastic contextual bandit setting, regret-minimizing algorithms have been
extensively researched, but their instance-minimizing best-arm identification counterparts …

被引用次数：27 相关文章所有 12 个版本

[PDF] neurips.cc

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature

K Dong, J Yang, T Ma - Advances in neural information …, 2021 - proceedings.neurips.cc

This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …

被引用次数：42 相关文章所有 8 个版本

[PDF] mlr.press

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

被引用次数：19 相关文章所有 3 个版本

[PDF] mlr.press

Information directed sampling for linear partial monitoring

J Kirschner, T Lattimore… - Conference on Learning …, 2020 - proceedings.mlr.press

Partial monitoring is a rich framework for sequential decision making under uncertainty that
generalizes many well known bandit models, including linear, combinatorial and dueling …

被引用次数：55 相关文章所有 5 个版本