PAC rank elicitation through adaptive sampling of stochastic pairwise preferences

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org

In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

被引用次数：102 相关文章所有 7 个版本

[PDF] mlr.press

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press

We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

被引用次数：20 相关文章所有 2 个版本

[PDF] neurips.cc

Copeland dueling bandits

M Zoghi, ZS Karnin, S Whiteson… - Advances in neural …, 2015 - proceedings.neurips.cc

A version of the dueling bandit problem is addressed in which a Condorcet winner may not
exist. Two algorithms are proposed that instead seek to minimize regret with respect to the …

被引用次数：105 相关文章所有 13 个版本

[PDF] neurips.cc

Online rank elicitation for plackett-luce: A dueling bandits approach

B Szörényi, R Busa-Fekete, A Paul… - Advances in neural …, 2015 - proceedings.neurips.cc

We study the problem of online rank elicitation, assuming that rankings of a set of
alternatives obey the Plackett-Luce distribution. Following the setting of the dueling bandits …

被引用次数：99 相关文章所有 13 个版本

[PDF] mlr.press

Maximum selection and ranking under noisy comparisons

M Falahatgar, A Orlitsky, V Pichapati… - … on Machine Learning, 2017 - proceedings.mlr.press

Abstract We consider $(\epsilon,\delta) $-PAC maximum-selection and ranking using
pairwise comparisons for general probabilistic models whose comparison probabilities …

被引用次数：66 相关文章所有 5 个版本

[PDF] neurips.cc

Maxing and ranking with few assumptions

M Falahatgar, Y Hao, A Orlitsky… - Advances in …, 2017 - proceedings.neurips.cc

PAC maximum selection (maxing) and ranking of $ n $ elements via random pairwise
comparisons have diverse applications and have been studied under many models and …

被引用次数：49 相关文章所有 5 个版本

[PDF] uni-paderborn.de

A survey of preference-based online learning with bandit algorithms

R Busa-Fekete, E Hüllermeier - … , ALT 2014, Bled, Slovenia, October 8-10 …, 2014 - Springer

In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

被引用次数：65 相关文章所有 6 个版本

[PDF] mlr.press

PAC battling bandits in the plackett-luce model

A Saha, A Gopalan - Algorithmic Learning Theory, 2019 - proceedings.mlr.press

We introduce the probably approximately correct (PAC)\emph {Battling-Bandit} problem with
the Plackett-Luce (PL) subset choice model–an online learning framework where at each …

被引用次数：36 相关文章所有 4 个版本

[PDF] arxiv.org

Borda regret minimization for generalized linear dueling bandits

Y Wu, T Jin, H Lou, F Farnoud, Q Gu - arXiv preprint arXiv:2303.08816, 2023 - arxiv.org

Dueling bandits are widely used to model preferential feedback prevalent in many
applications such as recommendation systems and ranking. In this paper, we study the …

被引用次数：6 相关文章所有 4 个版本

[PDF] mlr.press

The limits of maxing, ranking, and preference learning

M Falahatgar, A Jain, A Orlitsky… - International …, 2018 - proceedings.mlr.press

We present a comprehensive understanding of three important problems in PAC preference
learning: maximum selection (maxing), ranking, and estimating all pairwise preference …

被引用次数：36 相关文章所有 3 个版本