Joint policy-value learning for recommendation

J Chen, H Dong, X Wang, F Feng, M Wang… - ACM Transactions on …, 2023 - dl.acm.org

While recent years have witnessed a rapid growth of research papers on recommender
system (RS), most of the papers focus on inventing machine learning models to better fit …

被引用次数：738 相关文章所有 6 个版本

[PDF] arxiv.org

On the opportunities and challenges of offline reinforcement learning for recommender systems

X Chen, S Wang, J McAuley, D Jannach… - ACM Transactions on …, 2024 - dl.acm.org

Reinforcement learning serves as a potent tool for modeling dynamic user interests within
recommender systems, garnering increasing research attention of late. However, a …

被引用次数：5 相关文章所有 4 个版本

[PDF] acm.org

Off-policy actor-critic for recommender systems

M Chen, C Xu, V Gatto, D Jain, A Kumar… - Proceedings of the 16th …, 2022 - dl.acm.org

Industrial recommendation platforms are increasingly concerned with how to make
recommendations that cause users to enjoy their long term experience on the platform …

被引用次数：42 相关文章所有 2 个版本

[PDF] researchgate.net

Pessimistic reward models for off-policy learning in recommendation

O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org

Methods for bandit learning from user interactions often require a model of the reward a
certain context-action pair will yield–for example, the probability of a click on a …

被引用次数：49 相关文章所有 4 个版本

[PDF] acm.org

Pessimistic decision-making for recommender systems

O Jeunen, B Goethals - ACM Transactions on Recommender Systems, 2023 - dl.acm.org

Modern recommender systems are often modelled under the sequential decision-making
paradigm, where the system decides which recommendations to show in order to maximise …

被引用次数：14 相关文章所有 2 个版本

[PDF] ruc.edu.cn

Counteracting user attention bias in music streaming recommendation via reward modification

X Zhang, S Dai, J Xu, Z Dong, Q Dai… - Proceedings of the 28th …, 2022 - dl.acm.org

In streaming media applications, like music Apps, songs are recommended in a continuous
way in users' daily life. The recommended songs are played automatically although users …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

BLOB: A probabilistic model for recommendation that combines organic and bandit signals

O Sakhi, S Bonner, D Rohde, F Vasile - Proceedings of the 26th ACM …, 2020 - dl.acm.org

A common task for recommender systems is to build a profile of the interests of a user from
items in their browsing history and later to recommend items to the user from the same …

被引用次数：39 相关文章所有 8 个版本

Top-k contextual bandits with equity of exposure

O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org

The contextual bandit paradigm provides a general framework for decision-making under
uncertainty. It is theoretically well-defined and well-studied, and many personalisation use …

被引用次数：24 相关文章所有 2 个版本

Off-Policy Learning-to-Bid with AuctionGym

O Jeunen, S Murphy, B Allison - Proceedings of the 29th ACM SIGKDD …, 2023 - dl.acm.org

Online advertising opportunities are sold through auctions, billions of times every day across
the web. Advertisers who participate in those auctions need to decide on a bidding strategy …

被引用次数：5 相关文章

[PDF] ntu.edu.tw

Practical counterfactual policy learning for top-k recommendations

Y Liu, JN Yen, B Yuan, R Shi, P Yan… - Proceedings of the 28th …, 2022 - dl.acm.org

For building recommender systems, a critical task is to learn a policy with collected feedback
(eg, ratings, clicks) to decide which items to be recommended to users. However, it has been …

被引用次数：13 相关文章所有 2 个版本