Preference-based online learning with dueling bandits: A survey

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

Copeland dueling bandits

M Zoghi, ZS Karnin, S Whiteson… - Advances in neural …, 2015 - proceedings.neurips.cc
A version of the dueling bandit problem is addressed in which a Condorcet winner may not
exist. Two algorithms are proposed that instead seek to minimize regret with respect to the …

Online rank elicitation for plackett-luce: A dueling bandits approach

B Szörényi, R Busa-Fekete, A Paul… - Advances in neural …, 2015 - proceedings.neurips.cc
We study the problem of online rank elicitation, assuming that rankings of a set of
alternatives obey the Plackett-Luce distribution. Following the setting of the dueling bandits …

Maximum selection and ranking under noisy comparisons

M Falahatgar, A Orlitsky, V Pichapati… - … on Machine Learning, 2017 - proceedings.mlr.press
Abstract We consider $(\epsilon,\delta) $-PAC maximum-selection and ranking using
pairwise comparisons for general probabilistic models whose comparison probabilities …

Maxing and ranking with few assumptions

M Falahatgar, Y Hao, A Orlitsky… - Advances in …, 2017 - proceedings.neurips.cc
PAC maximum selection (maxing) and ranking of $ n $ elements via random pairwise
comparisons have diverse applications and have been studied under many models and …

A survey of preference-based online learning with bandit algorithms

R Busa-Fekete, E Hüllermeier - … , ALT 2014, Bled, Slovenia, October 8-10 …, 2014 - Springer
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

PAC battling bandits in the plackett-luce model

A Saha, A Gopalan - Algorithmic Learning Theory, 2019 - proceedings.mlr.press
We introduce the probably approximately correct (PAC)\emph {Battling-Bandit} problem with
the Plackett-Luce (PL) subset choice model–an online learning framework where at each …

Borda regret minimization for generalized linear dueling bandits

Y Wu, T Jin, H Lou, F Farnoud, Q Gu - arXiv preprint arXiv:2303.08816, 2023 - arxiv.org
Dueling bandits are widely used to model preferential feedback prevalent in many
applications such as recommendation systems and ranking. In this paper, we study the …

The limits of maxing, ranking, and preference learning

M Falahatgar, A Jain, A Orlitsky… - International …, 2018 - proceedings.mlr.press
We present a comprehensive understanding of three important problems in PAC preference
learning: maximum selection (maxing), ranking, and estimating all pairwise preference …