Partial monitoring—classification, regret bounds, and algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2981 相关文章所有 9 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1108 相关文章所有 7 个版本

[PDF] arxiv.org

Efficient exploration through bayesian deep q-networks

K Azizzadenesheli, E Brunskill… - 2018 Information …, 2018 - ieeexplore.ieee.org

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …

被引用次数：203 相关文章所有 13 个版本

[PDF] neurips.cc

Learning to optimize via information-directed sampling

D Russo, B Van Roy - Advances in neural information …, 2014 - proceedings.neurips.cc

We propose information-directed sampling--a new algorithm for online optimization
problems in which a decision-maker must balance between exploration and exploitation …

被引用次数：243 相关文章所有 10 个版本

[PDF] neurips.cc

Causal bandits: Learning good interventions via causal inference

F Lattimore, T Lattimore… - Advances in neural …, 2016 - proceedings.neurips.cc

We study the problem of using causal models to improve the rate at which good
interventions can be learned online in a stochastic environment. Our formalism combines …

被引用次数：177 相关文章所有 9 个版本

[PDF] informs.org

Learning to optimize via information-directed sampling

D Russo, B Van Roy - Operations Research, 2018 - pubsonline.informs.org

We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …

被引用次数：136 相关文章所有 6 个版本

[PDF] mlr.press

Online learning with feedback graphs: Beyond bandits

N Alon, N Cesa-Bianchi, O Dekel… - … on Learning Theory, 2015 - proceedings.mlr.press

We study a general class of online learning problems where the feedback is specified by a
graph. This class includes online prediction with expert advice and the multi-armed bandit …

被引用次数：176 相关文章所有 12 个版本

[PDF] neurips.cc

High-dimensional sparse linear bandits

B Hao, T Lattimore, M Wang - Advances in Neural …, 2020 - proceedings.neurips.cc

Stochastic linear bandits with high-dimensional sparse features are a practical model for a
variety of domains, such as personalized medicine and online advertising. We derive a …

被引用次数：61 相关文章所有 9 个版本

[PDF] mlr.press

The end of optimism? an asymptotic analysis of finite-armed linear bandits

T Lattimore, C Szepesvari - Artificial Intelligence and …, 2017 - proceedings.mlr.press

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with
numerous practical applications. Current approaches focus on generalising existing …

被引用次数：139 相关文章所有 8 个版本

[PDF] jmlr.org

Preference-based online learning with dueling bandits: A survey

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org

In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

被引用次数：110 相关文章所有 7 个版本