Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

A survey on causal inference

L Yao, Z Chu, S Li, Y Li, J Gao, A Zhang - ACM Transactions on …, 2021 - dl.acm.org
Causal inference is a critical research topic across many domains, such as statistics,
computer science, education, public policy, and economics, for decades. Nowadays …

Bias and debias in recommender system: A survey and future directions

J Chen, H Dong, X Wang, F Feng, M Wang… - ACM Transactions on …, 2023 - dl.acm.org
While recent years have witnessed a rapid growth of research papers on recommender
system (RS), most of the papers focus on inventing machine learning models to better fit …

Top-k off-policy correction for a REINFORCE recommender system

M Chen, A Beutel, P Covington, S Jain… - Proceedings of the …, 2019 - dl.acm.org
Industrial recommender systems deal with extremely large action spaces--many millions of
items to recommend. Moreover, they need to serve billions of users, who are unique at any …

Causal inference in recommender systems: A survey and future directions

C Gao, Y Zheng, W Wang, F Feng, X He… - ACM Transactions on …, 2024 - dl.acm.org
Recommender systems have become crucial in information filtering nowadays. Existing
recommender systems extract user preferences based on the correlation in data, such as …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections

O Nachum, Y Chow, B Dai, L Li - Advances in neural …, 2019 - proceedings.neurips.cc
In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …

Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfaction in recommendation systems

R Mehrotra, J McInerney, H Bouchard… - Proceedings of the 27th …, 2018 - dl.acm.org
Two-sided marketplaces are platforms that have customers not only on the demand side (eg
users), but also on the supply side (eg retailer, artists). While traditional recommender …

Calibrated recommendations

H Steck - Proceedings of the 12th ACM conference on …, 2018 - dl.acm.org
When a user has watched, say, 70 romance movies and 30 action movies, then it is
reasonable to expect the personalized list of recommended movies to be comprised of about …

Explore, exploit, and explain: personalizing explainable recommendations with bandits

J McInerney, B Lacker, S Hansen, K Higley… - Proceedings of the 12th …, 2018 - dl.acm.org
The multi-armed bandit is an important framework for balancing exploration with exploitation
in recommendation. Exploitation recommends content (eg, products, movies, music playlists) …