Offline reinforcement learning: Tutorial, review, and perspectives on open problems
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …
started on research on offline reinforcement learning algorithms: reinforcement learning …
A survey on causal inference
Causal inference is a critical research topic across many domains, such as statistics,
computer science, education, public policy, and economics, for decades. Nowadays …
computer science, education, public policy, and economics, for decades. Nowadays …
Bias and debias in recommender system: A survey and future directions
While recent years have witnessed a rapid growth of research papers on recommender
system (RS), most of the papers focus on inventing machine learning models to better fit …
system (RS), most of the papers focus on inventing machine learning models to better fit …
Top-k off-policy correction for a REINFORCE recommender system
Industrial recommender systems deal with extremely large action spaces--many millions of
items to recommend. Moreover, they need to serve billions of users, who are unique at any …
items to recommend. Moreover, they need to serve billions of users, who are unique at any …
Causal inference in recommender systems: A survey and future directions
Recommender systems have become crucial in information filtering nowadays. Existing
recommender systems extract user preferences based on the correlation in data, such as …
recommender systems extract user preferences based on the correlation in data, such as …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections
In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …
Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfaction in recommendation systems
Two-sided marketplaces are platforms that have customers not only on the demand side (eg
users), but also on the supply side (eg retailer, artists). While traditional recommender …
users), but also on the supply side (eg retailer, artists). While traditional recommender …
Calibrated recommendations
H Steck - Proceedings of the 12th ACM conference on …, 2018 - dl.acm.org
When a user has watched, say, 70 romance movies and 30 action movies, then it is
reasonable to expect the personalized list of recommended movies to be comprised of about …
reasonable to expect the personalized list of recommended movies to be comprised of about …
Explore, exploit, and explain: personalizing explainable recommendations with bandits
J McInerney, B Lacker, S Hansen, K Higley… - Proceedings of the 12th …, 2018 - dl.acm.org
The multi-armed bandit is an important framework for balancing exploration with exploitation
in recommendation. Exploitation recommends content (eg, products, movies, music playlists) …
in recommendation. Exploitation recommends content (eg, products, movies, music playlists) …