Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards

U Siddique, P Weng, M Zimmer - … Conference on Machine …, 2020 - proceedings.mlr.press
As the operations of autonomous systems generally affect simultaneously several users, it is
crucial that their designs account for fairness considerations. In contrast to standard (deep) …

Promptable behaviors: Personalizing multi-objective rewards from human preferences

M Hwang, L Weihs, C Park, K Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com
Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …

Environmental and social equity in network design of sustainable closed-loop supply chains

O Battaïa, R Guillaume, Z Krug, R Oloruntoba - International Journal of …, 2023 - Elsevier
While the whole society is meant to benefit from sustainable development; environmental
and social fairness considerations are often overlooked in the design of supply chain …

Bandit based optimization of multiple objectives on a music streaming platform

R Mehrotra, N Xue, M Lalmas - Proceedings of the 26th ACM SIGKDD …, 2020 - dl.acm.org
Recommender systems powering online multi-stakeholder platforms often face the
challenge of jointly optimizing multiple objectives, in an attempt to efficiently match suppliers …

Learning fair policies in decentralized cooperative multi-agent reinforcement learning

M Zimmer, C Glanois, U Siddique… - … on Machine Learning, 2021 - proceedings.mlr.press
We consider the problem of learning fair policies in (deep) cooperative multi-agent
reinforcement learning (MARL). We formalize it in a principled way as the problem of …

Optimizing generalized Gini indices for fairness in rankings

V Do, N Usunier - Proceedings of the 45th International ACM SIGIR …, 2022 - dl.acm.org
There is growing interest in designing recommender systems that aim at being fair towards
item producers or their least satisfied users. Inspired by the domain of inequality …

[HTML][HTML] Survey of multiarmed bandit algorithms applied to recommendation systems

G Elena, K Milos, I Eugene - International Journal of Open …, 2021 - cyberleninka.ru
The main goal of this paper is to introduce the reader to the multiarmed bandit algorithms of
different types and to observe how the industry leveraged them in advancing …

Collaborative Bayesian optimization with fair regret

RHL Sim, Y Zhang, BKH Low… - … Conference on Machine …, 2021 - proceedings.mlr.press
Bayesian optimization (BO) is a popular tool for optimizing complex and costly-to-evaluate
black-box objective functions. To further reduce the number of function evaluations, any …

Ad-load Balancing via Off-policy Learning in a Content Marketplace

H Sagtani, MG Jhawar, R Mehrotra… - Proceedings of the 17th …, 2024 - dl.acm.org
Ad-load balancing is a critical challenge in online advertising systems, particularly in the
context of social media platforms, where the goal is to maximize user engagement and …

Regret minimization for reinforcement learning with vectorial feedback and complex objectives

WC Cheung - Advances in Neural Information Processing …, 2019 - proceedings.neurips.cc
We consider an agent who is involved in an online Markov decision process, and receives a
vector of outcomes every round. The agent aims to simultaneously optimize multiple …