[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective

DJ Foster, A Rakhlin, D Simchi-Levi, Y Xu - arXiv preprint arXiv …, 2020 - arxiv.org
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved
performance on" easy" problems with a gap between the best and second-best arm. Are …

High-dimensional sparse linear bandits

B Hao, T Lattimore, M Wang - Advances in Neural …, 2020 - proceedings.neurips.cc
Stochastic linear bandits with high-dimensional sparse features are a practical model for a
variety of domains, such as personalized medicine and online advertising. We derive a …

Federated linear contextual bandits with user-level differential privacy

R Huang, H Zhang, L Melis, M Shen… - International …, 2023 - proceedings.mlr.press
This paper studies federated linear contextual bandits under the notion of user-level
differential privacy (DP). We first introduce a unified federated bandits framework that can …

Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously

CW Lee, H Luo, CY Wei, M Zhang… - … on Machine Learning, 2021 - proceedings.mlr.press
In this work, we develop linear bandit algorithms that automatically adapt to different
environments. By plugging a novel loss estimator into the optimization problem that …

Approximate allocation matching for structural causal bandits with unobserved confounders

L Wei, MQ Elahi, M Ghasemi… - Advances in Neural …, 2024 - proceedings.neurips.cc
Structural causal bandit provides a framework for online decision-making problems when
causal information is available. It models the stochastic environment with a structural causal …

Instance-optimal pac algorithms for contextual bandits

Z Li, L Ratliff, KG Jamieson… - Advances in Neural …, 2022 - proceedings.neurips.cc
In the stochastic contextual bandit setting, regret-minimizing algorithms have been
extensively researched, but their instance-minimizing best-arm identification counterparts …

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature

K Dong, J Yang, T Ma - Advances in neural information …, 2021 - proceedings.neurips.cc
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …

Instance-optimality in interactive decision making: Toward a non-asymptotic theory

AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …

Information directed sampling for linear partial monitoring

J Kirschner, T Lattimore… - Conference on Learning …, 2020 - proceedings.mlr.press
Partial monitoring is a rich framework for sequential decision making under uncertainty that
generalizes many well known bandit models, including linear, combinatorial and dueling …