[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Efficient exploration through bayesian deep q-networks
K Azizzadenesheli, E Brunskill… - 2018 Information …, 2018 - ieeexplore.ieee.org
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …
Learning to optimize via information-directed sampling
We propose information-directed sampling--a new algorithm for online optimization
problems in which a decision-maker must balance between exploration and exploitation …
problems in which a decision-maker must balance between exploration and exploitation …
Causal bandits: Learning good interventions via causal inference
F Lattimore, T Lattimore… - Advances in neural …, 2016 - proceedings.neurips.cc
We study the problem of using causal models to improve the rate at which good
interventions can be learned online in a stochastic environment. Our formalism combines …
interventions can be learned online in a stochastic environment. Our formalism combines …
Learning to optimize via information-directed sampling
We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …
problems in which a decision maker must balance between exploration and exploitation …
Online learning with feedback graphs: Beyond bandits
We study a general class of online learning problems where the feedback is specified by a
graph. This class includes online prediction with expert advice and the multi-armed bandit …
graph. This class includes online prediction with expert advice and the multi-armed bandit …
High-dimensional sparse linear bandits
Stochastic linear bandits with high-dimensional sparse features are a practical model for a
variety of domains, such as personalized medicine and online advertising. We derive a …
variety of domains, such as personalized medicine and online advertising. We derive a …
The end of optimism? an asymptotic analysis of finite-armed linear bandits
T Lattimore, C Szepesvari - Artificial Intelligence and …, 2017 - proceedings.mlr.press
Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with
numerous practical applications. Current approaches focus on generalising existing …
numerous practical applications. Current approaches focus on generalising existing …
Preference-based online learning with dueling bandits: A survey
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …
problems, in which an agent is supposed to simultaneously explore and exploit a given set …