Sparse dueling bandits

[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：2852 相关文章所有 9 个版本

[PDF] jmlr.org

Simple, robust and optimal ranking from pairwise comparisons

NB Shah, MJ Wainwright - Journal of machine learning research, 2018 - jmlr.org

We consider data in the form of pairwise comparisons of n items, with the goal of identifying
the top k items for some value of k< n, or alternatively, recovering a ranking of all the items …

被引用次数：202 相关文章所有 6 个版本

[PDF] jmlr.org

Preference-based online learning with dueling bandits: A survey

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org

In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

被引用次数：99 相关文章所有 7 个版本

[PDF] mlr.press

Efficient and optimal algorithms for contextual dueling bandits under realizability

A Saha, A Krishnamurthy - International Conference on …, 2022 - proceedings.mlr.press

We study the $ K $-armed contextual dueling bandit problem, a sequential decision making
setting in which the learner uses contextual information to make two decisions, but only …

被引用次数：34 相关文章所有 3 个版本

[PDF] mlr.press

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press

We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

被引用次数：19 相关文章所有 2 个版本

[PDF] mlr.press

Preferential bayesian optimization

J González, Z Dai, A Damianou… - … on Machine Learning, 2017 - proceedings.mlr.press

Bayesian optimization (BO) has emerged during the last few years as an effective approach
to optimize black-box functions where direct queries of the objective are expensive. We …

被引用次数：113 相关文章所有 8 个版本

[PDF] projecteuclid.org

Active ranking from pairwise comparisons and when parametric assumptions do not help

R Heckel, NB Shah, K Ramchandran, MJ Wainwright - 2019 - projecteuclid.org

Active ranking from pairwise comparisons and when parametric assumptions do not help Page
1 The Annals of Statistics 2019, Vol. 47, No. 6, 3099–3126 https://doi.org/10.1214/18-AOS1772 …

被引用次数：122 相关文章所有 9 个版本

[PDF] neurips.cc

Double thompson sampling for dueling bandits

H Wu, X Liu - Advances in neural information processing …, 2016 - proceedings.neurips.cc

In this paper, we propose a Double Thompson Sampling (D-TS) algorithm for dueling bandit
problems. As its name suggests, D-TS selects both the first and the second candidates …

被引用次数：96 相关文章所有 7 个版本

[PDF] arxiv.org

Multi-dueling bandits with dependent arms

Y Sui, V Zhuang, JW Burdick, Y Yue - arXiv preprint arXiv:1705.00253, 2017 - arxiv.org

The dueling bandits problem is an online learning framework for learning from pairwise
preference feedback, and is particularly well-suited for modeling settings that elicit …

被引用次数：88 相关文章所有 9 个版本

[PDF] ijcai.org

[PDF][PDF] Advancements in Dueling Bandits.

Y Sui, M Zoghi, K Hofmann, Y Yue - IJCAI, 2018 - ijcai.org

The dueling bandits problem is an online learning framework where learning happens “on-
thefly” through preference feedback, ie, from comparisons between a pair of actions. Unlike …

被引用次数：68 相关文章所有 5 个版本