Preference-based online learning with dueling bandits: A survey
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …
problems, in which an agent is supposed to simultaneously explore and exploit a given set …
Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences
A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …
environments, where the goal of the learner is to aggregate information through relative …
Copeland dueling bandits
A version of the dueling bandit problem is addressed in which a Condorcet winner may not
exist. Two algorithms are proposed that instead seek to minimize regret with respect to the …
exist. Two algorithms are proposed that instead seek to minimize regret with respect to the …
Online rank elicitation for plackett-luce: A dueling bandits approach
We study the problem of online rank elicitation, assuming that rankings of a set of
alternatives obey the Plackett-Luce distribution. Following the setting of the dueling bandits …
alternatives obey the Plackett-Luce distribution. Following the setting of the dueling bandits …
Maximum selection and ranking under noisy comparisons
Abstract We consider $(\epsilon,\delta) $-PAC maximum-selection and ranking using
pairwise comparisons for general probabilistic models whose comparison probabilities …
pairwise comparisons for general probabilistic models whose comparison probabilities …
Maxing and ranking with few assumptions
PAC maximum selection (maxing) and ranking of $ n $ elements via random pairwise
comparisons have diverse applications and have been studied under many models and …
comparisons have diverse applications and have been studied under many models and …
A survey of preference-based online learning with bandit algorithms
R Busa-Fekete, E Hüllermeier - … , ALT 2014, Bled, Slovenia, October 8-10 …, 2014 - Springer
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …
problems, in which an agent is supposed to simultaneously explore and exploit a given set …
PAC battling bandits in the plackett-luce model
We introduce the probably approximately correct (PAC)\emph {Battling-Bandit} problem with
the Plackett-Luce (PL) subset choice model–an online learning framework where at each …
the Plackett-Luce (PL) subset choice model–an online learning framework where at each …
Borda regret minimization for generalized linear dueling bandits
Dueling bandits are widely used to model preferential feedback prevalent in many
applications such as recommendation systems and ranking. In this paper, we study the …
applications such as recommendation systems and ranking. In this paper, we study the …
The limits of maxing, ranking, and preference learning
We present a comprehensive understanding of three important problems in PAC preference
learning: maximum selection (maxing), ranking, and estimating all pairwise preference …
learning: maximum selection (maxing), ranking, and estimating all pairwise preference …