[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Learning with good feature representations in bandits and in rl with a generative model
T Lattimore, C Szepesvari… - … conference on machine …, 2020 - proceedings.mlr.press
The construction in the recent paper by Du et al.[2019] implies that searching for a near-
optimal action in a bandit sometimes requires examining essentially all the actions, even if …
optimal action in a bandit sometimes requires examining essentially all the actions, even if …
Top two algorithms revisited
Top two algorithms arose as an adaptation of Thompson sampling to best arm identification
in multi-armed bandit models for parametric families of arms. They select the next arm to …
in multi-armed bandit models for parametric families of arms. They select the next arm to …
Gamification of pure exploration for linear bandits
We investigate an active\emph {pure-exploration} setting, that includes\emph {best-arm
identification}, in the context of\emph {linear stochastic bandits}. While asymptotically optimal …
identification}, in the context of\emph {linear stochastic bandits}. While asymptotically optimal …
Mixture martingales revisited with applications to sequential tests and confidence intervals
E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org
This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …
Fast pure exploration via frank-wolfe
We study the problem of active pure exploration with fixed confidence in generic stochastic
bandit environments. The goal of the learner is to answer a query about the environment …
bandit environments. The goal of the learner is to answer a query about the environment …
Fixed-confidence guarantees for bayesian best-arm identification
We investigate and provide new insights on the sampling rule called Top-Two Thompson
Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification …
Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification …
Adaptive exploration in linear contextual bandit
B Hao, T Lattimore… - … Conference on Artificial …, 2020 - proceedings.mlr.press
Contextual bandits serve as a fundamental model for many sequential decision making
tasks. The most popular theoretically justified approaches are based on the optimism …
tasks. The most popular theoretically justified approaches are based on the optimism …
On the existence of a complexity in fixed budget bandit identification
R Degenne - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
In fixed budget bandit identification, an algorithm sequentially observes samples from
several distributions up to a given final time. It then answers a query about the set of …
several distributions up to a given final time. It then answers a query about the set of …
Instance-optimality in interactive decision making: Toward a non-asymptotic theory
AJ Wagenmaker, DJ Foster - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the development of adaptive, instance-dependent algorithms for interactive
decision making (bandits, reinforcement learning, and beyond) that, rather than only …
decision making (bandits, reinforcement learning, and beyond) that, rather than only …