Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards
As the operations of autonomous systems generally affect simultaneously several users, it is
crucial that their designs account for fairness considerations. In contrast to standard (deep) …
crucial that their designs account for fairness considerations. In contrast to standard (deep) …
Promptable behaviors: Personalizing multi-objective rewards from human preferences
Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …
underexplored challenge in the field of embodied AI. In this paper we present Promptable …
Environmental and social equity in network design of sustainable closed-loop supply chains
While the whole society is meant to benefit from sustainable development; environmental
and social fairness considerations are often overlooked in the design of supply chain …
and social fairness considerations are often overlooked in the design of supply chain …
Bandit based optimization of multiple objectives on a music streaming platform
Recommender systems powering online multi-stakeholder platforms often face the
challenge of jointly optimizing multiple objectives, in an attempt to efficiently match suppliers …
challenge of jointly optimizing multiple objectives, in an attempt to efficiently match suppliers …
Learning fair policies in decentralized cooperative multi-agent reinforcement learning
We consider the problem of learning fair policies in (deep) cooperative multi-agent
reinforcement learning (MARL). We formalize it in a principled way as the problem of …
reinforcement learning (MARL). We formalize it in a principled way as the problem of …
Optimizing generalized Gini indices for fairness in rankings
There is growing interest in designing recommender systems that aim at being fair towards
item producers or their least satisfied users. Inspired by the domain of inequality …
item producers or their least satisfied users. Inspired by the domain of inequality …
[HTML][HTML] Survey of multiarmed bandit algorithms applied to recommendation systems
G Elena, K Milos, I Eugene - International Journal of Open …, 2021 - cyberleninka.ru
The main goal of this paper is to introduce the reader to the multiarmed bandit algorithms of
different types and to observe how the industry leveraged them in advancing …
different types and to observe how the industry leveraged them in advancing …
Collaborative Bayesian optimization with fair regret
Bayesian optimization (BO) is a popular tool for optimizing complex and costly-to-evaluate
black-box objective functions. To further reduce the number of function evaluations, any …
black-box objective functions. To further reduce the number of function evaluations, any …
Ad-load Balancing via Off-policy Learning in a Content Marketplace
Ad-load balancing is a critical challenge in online advertising systems, particularly in the
context of social media platforms, where the goal is to maximize user engagement and …
context of social media platforms, where the goal is to maximize user engagement and …
Regret minimization for reinforcement learning with vectorial feedback and complex objectives
WC Cheung - Advances in Neural Information Processing …, 2019 - proceedings.neurips.cc
We consider an agent who is involved in an online Markov decision process, and receives a
vector of outcomes every round. The agent aims to simultaneously optimize multiple …
vector of outcomes every round. The agent aims to simultaneously optimize multiple …