[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved
performance on" easy" problems with a gap between the best and second-best arm. Are …
performance on" easy" problems with a gap between the best and second-best arm. Are …
High-dimensional sparse linear bandits
Stochastic linear bandits with high-dimensional sparse features are a practical model for a
variety of domains, such as personalized medicine and online advertising. We derive a …
variety of domains, such as personalized medicine and online advertising. We derive a …
Federated linear contextual bandits with user-level differential privacy
This paper studies federated linear contextual bandits under the notion of user-level
differential privacy (DP). We first introduce a unified federated bandits framework that can …
differential privacy (DP). We first introduce a unified federated bandits framework that can …
Achieving near instance-optimality and minimax-optimality in stochastic and adversarial linear bandits simultaneously
In this work, we develop linear bandit algorithms that automatically adapt to different
environments. By plugging a novel loss estimator into the optimization problem that …
environments. By plugging a novel loss estimator into the optimization problem that …
Approximate allocation matching for structural causal bandits with unobserved confounders
Structural causal bandit provides a framework for online decision-making problems when
causal information is available. It models the stochastic environment with a structural causal …
causal information is available. It models the stochastic environment with a structural causal …
Instance-optimal pac algorithms for contextual bandits
In the stochastic contextual bandit setting, regret-minimizing algorithms have been
extensively researched, but their instance-minimizing best-arm identification counterparts …
extensively researched, but their instance-minimizing best-arm identification counterparts …
Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …
function approximations. We propose to study convergence to approximate local maxima …
Instance-optimality in interactive decision making: Toward a non-asymptotic theory
AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
Information directed sampling for linear partial monitoring
J Kirschner, T Lattimore… - Conference on Learning …, 2020 - proceedings.mlr.press
Partial monitoring is a rich framework for sequential decision making under uncertainty that
generalizes many well known bandit models, including linear, combinatorial and dueling …
generalizes many well known bandit models, including linear, combinatorial and dueling …