Preference-based online learning with dueling bandits: A survey
In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …
problems, in which an agent is supposed to simultaneously explore and exploit a given set …
Efficient and optimal algorithms for contextual dueling bandits under realizability
A Saha, A Krishnamurthy - International Conference on …, 2022 - proceedings.mlr.press
We study the $ K $-armed contextual dueling bandit problem, a sequential decision making
setting in which the learner uses contextual information to make two decisions, but only …
setting in which the learner uses contextual information to make two decisions, but only …
Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences
A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …
environments, where the goal of the learner is to aggregate information through relative …
Optimal algorithms for stochastic contextual preference bandits
A Saha - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
We consider the problem of preference bandits in the contextual setting. At each round, the
learner is presented with a context set of $ K $ items, chosen randomly from a potentially …
learner is presented with a context set of $ K $ items, chosen randomly from a potentially …
Comparison-based conversational recommender system with relative bandit feedback
With the recent advances of conversational recommendations, the recommender system is
able to actively and dynamically elicit user preference via conversational interactions. To …
able to actively and dynamically elicit user preference via conversational interactions. To …
Stochastic contextual dueling bandits under linear stochastic transitivity models
We consider the regret minimization task in a dueling bandits problem with context
information. In every round of the sequential decision problem, the learner makes a context …
information. In every round of the sequential decision problem, the learner makes a context …
Batched dueling bandits
The K-armed dueling bandit problem, where the feedback is in the form of noisy pairwise
comparisons, has been widely studied. Previous works have only focused on the sequential …
comparisons, has been widely studied. Previous works have only focused on the sequential …
Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources
We consider the problem of reward maximization in the dueling bandit setup along with
constraints on resource consumption. As in the classic dueling bandits, at each round the …
constraints on resource consumption. As in the classic dueling bandits, at each round the …
Choice bandits
A Agarwal, N Johnson… - Advances in neural …, 2020 - proceedings.neurips.cc
There has been much interest in recent years in the problem of dueling bandits, where on
each round the learner plays a pair of arms and receives as feedback the outcome of a …
each round the learner plays a pair of arms and receives as feedback the outcome of a …
Anaconda: An improved dynamic regret algorithm for adaptive non-stationary dueling bandits
TK Buening, A Saha - International Conference on Artificial …, 2023 - proceedings.mlr.press
We study the problem of non-stationary dueling bandits and provide the first adaptive
dynamic regret algorithm for this problem. The only two existing attempts in this line of work …
dynamic regret algorithm for this problem. The only two existing attempts in this line of work …