[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Efficient exploration through bayesian deep q-networks

K Azizzadenesheli, E Brunskill… - 2018 Information …, 2018 - ieeexplore.ieee.org
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …

Learning to optimize via information-directed sampling

D Russo, B Van Roy - Advances in neural information …, 2014 - proceedings.neurips.cc
We propose information-directed sampling--a new algorithm for online optimization
problems in which a decision-maker must balance between exploration and exploitation …

Causal bandits: Learning good interventions via causal inference

F Lattimore, T Lattimore… - Advances in neural …, 2016 - proceedings.neurips.cc
We study the problem of using causal models to improve the rate at which good
interventions can be learned online in a stochastic environment. Our formalism combines …

Learning to optimize via information-directed sampling

D Russo, B Van Roy - Operations Research, 2018 - pubsonline.informs.org
We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …

Online learning with feedback graphs: Beyond bandits

N Alon, N Cesa-Bianchi, O Dekel… - … on Learning Theory, 2015 - proceedings.mlr.press
We study a general class of online learning problems where the feedback is specified by a
graph. This class includes online prediction with expert advice and the multi-armed bandit …

High-dimensional sparse linear bandits

B Hao, T Lattimore, M Wang - Advances in Neural …, 2020 - proceedings.neurips.cc
Stochastic linear bandits with high-dimensional sparse features are a practical model for a
variety of domains, such as personalized medicine and online advertising. We derive a …

The end of optimism? an asymptotic analysis of finite-armed linear bandits

T Lattimore, C Szepesvari - Artificial Intelligence and …, 2017 - proceedings.mlr.press
Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with
numerous practical applications. Current approaches focus on generalising existing …

Preference-based online learning with dueling bandits: A survey

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …