Combinatorial bandits with relative feedback

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org

In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

被引用次数：119 相关文章所有 7 个版本

[PDF] mlr.press

Efficient and optimal algorithms for contextual dueling bandits under realizability

A Saha, A Krishnamurthy - International Conference on …, 2022 - proceedings.mlr.press

We study the $ K $-armed contextual dueling bandit problem, a sequential decision making
setting in which the learner uses contextual information to make two decisions, but only …

被引用次数：39 相关文章所有 3 个版本

[PDF] mlr.press

Versatile dueling bandits: Best-of-both world analyses for learning from relative preferences

A Saha, P Gaillard - International Conference on Machine …, 2022 - proceedings.mlr.press

We study the problem of $ K $-armed dueling bandit for both stochastic and adversarial
environments, where the goal of the learner is to aggregate information through relative …

被引用次数：23 相关文章所有 2 个版本

[PDF] neurips.cc

Optimal algorithms for stochastic contextual preference bandits

A Saha - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

We consider the problem of preference bandits in the contextual setting. At each round, the
learner is presented with a context set of $ K $ items, chosen randomly from a potentially …

被引用次数：38 相关文章所有 5 个版本

[PDF] arxiv.org

Comparison-based conversational recommender system with relative bandit feedback

Z Xie, T Yu, C Zhao, S Li - Proceedings of the 44th International ACM …, 2021 - dl.acm.org

With the recent advances of conversational recommendations, the recommender system is
able to actively and dynamically elicit user preference via conversational interactions. To …

被引用次数：44 相关文章所有 4 个版本

[PDF] mlr.press

Stochastic contextual dueling bandits under linear stochastic transitivity models

V Bengs, A Saha, E Hüllermeier - … Conference on Machine …, 2022 - proceedings.mlr.press

We consider the regret minimization task in a dueling bandits problem with context
information. In every round of the sequential decision problem, the learner makes a context …

被引用次数：25 相关文章所有 5 个版本

[PDF] mlr.press

Batched dueling bandits

A Agarwal, R Ghuge… - … Conference on Machine …, 2022 - proceedings.mlr.press

The K-armed dueling bandit problem, where the feedback is in the form of noisy pairwise
comparisons, has been widely studied. Previous works have only focused on the sequential …

被引用次数：13 相关文章所有 5 个版本

[PDF] mlr.press

Think Before You Duel: Understanding Complexities of Preference Learning under Constrained Resources

R Deb, A Saha, A Banerjee - International Conference on …, 2024 - proceedings.mlr.press

We consider the problem of reward maximization in the dueling bandit setup along with
constraints on resource consumption. As in the classic dueling bandits, at each round the …

被引用次数：2 相关文章所有 3 个版本

[PDF] neurips.cc

Choice bandits

A Agarwal, N Johnson… - Advances in neural …, 2020 - proceedings.neurips.cc

There has been much interest in recent years in the problem of dueling bandits, where on
each round the learner plays a pair of arms and receives as feedback the outcome of a …

被引用次数：23 相关文章所有 7 个版本

[PDF] mlr.press

Anaconda: An improved dynamic regret algorithm for adaptive non-stationary dueling bandits

TK Buening, A Saha - International Conference on Artificial …, 2023 - proceedings.mlr.press

We study the problem of non-stationary dueling bandits and provide the first adaptive
dynamic regret algorithm for this problem. The only two existing attempts in this line of work …

被引用次数：7 相关文章所有 2 个版本