Online learning to rank for sequential music recommendation

Y Deldjoo, M Schedl, P Knees - Computer Science Review, 2024 - Elsevier

The music domain is among the most important ones for adopting recommender systems
technology. In contrast to most other recommendation domains, which predominantly rely on …

被引用次数：55 相关文章所有 5 个版本

[PDF] neurips.cc

Is rlhf more difficult than standard rl? a theoretical perspective

Y Wang, Q Liu, C Jin - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Abstract Reinforcement learning from Human Feedback (RLHF) learns from preference
signals, while standard Reinforcement Learning (RL) directly learns from reward signals …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

Positive, negative and neutral: Modeling implicit feedback in session-based news recommendation

S Gong, KQ Zhu - Proceedings of the 45th international ACM SIGIR …, 2022 - dl.acm.org

News recommendation for anonymous readers is a useful but challenging task for many
news portals, where interactions between readers and articles are limited within a temporary …

被引用次数：40 相关文章所有 6 个版本

[PDF] jmlr.org

Preference-based online learning with dueling bandits: A survey

V Bengs, R Busa-Fekete, A El Mesaoudi-Paul… - Journal of Machine …, 2021 - jmlr.org

In machine learning, the notion of multi-armed bandits refers to a class of online learning
problems, in which an agent is supposed to simultaneously explore and exploit a given set …

被引用次数：110 相关文章所有 7 个版本

[PDF] arxiv.org

Arithmetic control of llms for diverse user preferences: Directional preference alignment with multi-objective rewards

H Wang, Y Lin, W Xiong, R Yang, S Diao, S Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org

Fine-grained control over large language models (LLMs) remains a significant challenge,
hindering their adaptability to diverse user needs. While Reinforcement Learning from …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Is RLHF More Difficult than Standard RL?

Y Wang, Q Liu, C Jin - arXiv preprint arXiv:2306.14111, 2023 - arxiv.org

Reinforcement learning from Human Feedback (RLHF) learns from preference signals,
while standard Reinforcement Learning (RL) directly learns from reward signals …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Carousel personalization in music streaming apps with contextual bandits

W Bendada, G Salha, T Bontempelli - … of the 14th ACM Conference on …, 2020 - dl.acm.org

Media services providers, such as music streaming platforms, frequently leverage swipeable
carousels to recommend personalized content to their users. However, selecting the most …

被引用次数：56 相关文章所有 5 个版本

[PDF] ruc.edu.cn

Counteracting user attention bias in music streaming recommendation via reward modification

X Zhang, S Dai, J Xu, Z Dong, Q Dai… - Proceedings of the 28th …, 2022 - dl.acm.org

In streaming media applications, like music Apps, songs are recommended in a continuous
way in users' daily life. The recommended songs are played automatically although users …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Discover: Disentangled music representation learning for cover song identification

J Xun, S Zhang, Y Yang, J Zhu, L Deng… - Proceedings of the 46th …, 2023 - dl.acm.org

In the field of music information retrieval (MIR), cover song identification (CSI) is a
challenging task that aims to identify cover versions of a query song from a massive …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Building cross-sectional systematic strategies by learning to rank

D Poh, B Lim, S Zohren, S Roberts - arXiv preprint arXiv:2012.07149, 2020 - arxiv.org

The success of a cross-sectional systematic strategy depends critically on accurately ranking
assets prior to portfolio construction. Contemporary techniques perform this ranking step …

被引用次数：28 相关文章所有 10 个版本