A generalized algorithm for multi-objective reinforcement learning and policy adaptation

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：131 相关文章所有 13 个版本

[PDF] cell.com

Goals, usefulness and abstraction in value-based choice

B De Martino, A Cortese - Trends in Cognitive Sciences, 2023 - cell.com

Abstract Colombian drug lord Pablo Escobar, while on the run, purportedly burned two
million dollars in banknotes to keep his daughter warm. A stark reminder that, in life …

被引用次数：31 相关文章所有 7 个版本

[PDF] neurips.cc

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2024 - proceedings.neurips.cc

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

被引用次数：53 相关文章所有 7 个版本

[PDF] springer.com

A practical guide to multi-objective reinforcement learning and planning

CF Hayes, R Rădulescu, E Bargiacchi… - Autonomous Agents and …, 2022 - Springer

Real-world sequential decision-making tasks are generally complex, requiring trade-offs
between multiple, often conflicting, objectives. Despite this, the majority of research in …

被引用次数：280 相关文章所有 21 个版本

[PDF] mlr.press

Multi-objective gflownets

M Jain, SC Raparthy… - International …, 2023 - proceedings.mlr.press

We study the problem of generating diverse candidates in the context of Multi-Objective
Optimization. In many applications of machine learning such as drug discovery and material …

被引用次数：51 相关文章所有 7 个版本

[PDF] neurips.cc

Pareto set learning for expensive multi-objective optimization

X Lin, Z Yang, X Zhang… - Advances in neural …, 2022 - proceedings.neurips.cc

Expensive multi-objective optimization problems can be found in many real-world
applications, where their objective function evaluations involve expensive computations or …

被引用次数：38 相关文章所有 6 个版本

[PDF] mlr.press

Prediction-guided multi-objective reinforcement learning for continuous robot control

J Xu, Y Tian, P Ma, D Rus, S Sueda… - … on machine learning, 2020 - proceedings.mlr.press

Many real-world control problems involve conflicting objectives where we desire a dense
and high-quality set of control policies that are optimal for different objective preferences …

被引用次数：137 相关文章所有 10 个版本

[PDF] arxiv.org

Personalized soups: Personalized large language model alignment via post-hoc parameter merging

J Jang, S Kim, BY Lin, Y Wang, J Hessel… - arXiv preprint arXiv …, 2023 - arxiv.org

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language
Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning …

被引用次数：48 相关文章所有 2 个版本

[PDF] neurips.cc

Effective diversity in population based reinforcement learning

J Parker-Holder, A Pacchiano… - Advances in …, 2020 - proceedings.neurips.cc

Exploration is a key problem in reinforcement learning, since agents can only learn from
data they acquire in the environment. With that in mind, maintaining a population of agents is …

被引用次数：154 相关文章所有 8 个版本

[PDF] arxiv.org

Toward Pareto efficient fairness-utility trade-off in recommendation through reinforcement learning

Y Ge, X Zhao, L Yu, S Paul, D Hu, CC Hsieh… - Proceedings of the …, 2022 - dl.acm.org

The issue of fairness in recommendation is becoming increasingly essential as
Recommender Systems (RS) touch and influence more and more people in their daily lives …

被引用次数：69 相关文章所有 6 个版本