[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Contextual decision processes with low bellman rank are pac-learnable
This paper studies systematic exploration for reinforcement learning (RL) with rich
observations and function approximation. We introduce contextual decision processes …
observations and function approximation. We introduce contextual decision processes …
Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
Taming the monster: A fast and simple algorithm for contextual bandits
We present a new algorithm for the contextual bandit learning problem, where the learner
repeatedly takes one of K\emphactions in response to the observed\emphcontext, and …
repeatedly takes one of K\emphactions in response to the observed\emphcontext, and …
Bandits with knapsacks
Multi-armed bandit problems are the predominant theoretical model of exploration-
exploitation tradeoffs in learning, and they have countless applications ranging from medical …
exploitation tradeoffs in learning, and they have countless applications ranging from medical …
Adaptive treatment assignment in experiments for policy choice
M Kasy, A Sautmann - Econometrica, 2021 - Wiley Online Library
Standard experimental designs are geared toward point estimation and hypothesis testing,
while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …
while bandit algorithms are geared toward in‐sample outcomes. Here, we instead consider …
Adapting to misspecification in contextual bandits
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …
computationally efficient, yet support flexible, general-purpose function approximation …
[PDF][PDF] Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization.
T Desautels, A Krause, JW Burdick - J. Mach. Learn. Res., 2014 - jmlr.org
How can we take advantage of opportunities for experimental parallelization in
explorationexploitation tradeoffs? In many experimental scenarios, it is often desirable to …
explorationexploitation tradeoffs? In many experimental scenarios, it is often desirable to …