Off-policy evaluation for slate recommendation

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org

In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

被引用次数：2094 相关文章所有 3 个版本

[PDF] acm.org

A survey on causal inference

L Yao, Z Chu, S Li, Y Li, J Gao, A Zhang - ACM Transactions on …, 2021 - dl.acm.org

Causal inference is a critical research topic across many domains, such as statistics,
computer science, education, public policy, and economics, for decades. Nowadays …

被引用次数：579 相关文章所有 6 个版本

[PDF] arxiv.org

Bias and debias in recommender system: A survey and future directions

J Chen, H Dong, X Wang, F Feng, M Wang… - ACM Transactions on …, 2023 - dl.acm.org

While recent years have witnessed a rapid growth of research papers on recommender
system (RS), most of the papers focus on inventing machine learning models to better fit …

被引用次数：904 相关文章所有 6 个版本

[PDF] arxiv.org

Top-k off-policy correction for a REINFORCE recommender system

M Chen, A Beutel, P Covington, S Jain… - Proceedings of the …, 2019 - dl.acm.org

Industrial recommender systems deal with extremely large action spaces--many millions of
items to recommend. Moreover, they need to serve billions of users, who are unique at any …

被引用次数：533 相关文章所有 10 个版本

[PDF] arxiv.org

Causal inference in recommender systems: A survey and future directions

C Gao, Y Zheng, W Wang, F Feng, X He… - ACM Transactions on …, 2024 - dl.acm.org

Recommender systems have become crucial in information filtering nowadays. Existing
recommender systems extract user preferences based on the correlation in data, such as …

被引用次数：98 相关文章所有 4 个版本

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

被引用次数：1220 相关文章所有 7 个版本

[PDF] neurips.cc

Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections

O Nachum, Y Chow, B Dai, L Li - Advances in neural …, 2019 - proceedings.neurips.cc

In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …

被引用次数：373 相关文章所有 9 个版本

[PDF] rishabhmehrotra.com

Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfaction in recommendation systems

R Mehrotra, J McInerney, H Bouchard… - Proceedings of the 27th …, 2018 - dl.acm.org

Two-sided marketplaces are platforms that have customers not only on the demand side (eg
users), but also on the supply side (eg retailer, artists). While traditional recommender …

被引用次数：349 相关文章所有 6 个版本

[PDF] acm.org

Calibrated recommendations

H Steck - Proceedings of the 12th ACM conference on …, 2018 - dl.acm.org

When a user has watched, say, 70 romance movies and 30 action movies, then it is
reasonable to expect the personalized list of recommended movies to be comprised of about …

被引用次数：296 相关文章

[PDF] jamesmc.com

Explore, exploit, and explain: personalizing explainable recommendations with bandits

J McInerney, B Lacker, S Hansen, K Higley… - Proceedings of the 12th …, 2018 - dl.acm.org

The multi-armed bandit is an important framework for balancing exploration with exploitation
in recommendation. Exploitation recommends content (eg, products, movies, music playlists) …

被引用次数：255 相关文章所有 6 个版本