Interactive learning from policy-dependent human feedback

T Wu, S He, J Liu, S Sun, K Liu… - IEEE/CAA Journal of …, 2023 - ieeexplore.ieee.org

ChatGPT, an artificial intelligence generated content (AIGC) model developed by OpenAI,
has attracted world-wide attention for its capability of dealing with challenging language …

被引用次数：576 相关文章所有 4 个版本

[PDF] arxiv.org

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：112 相关文章所有 3 个版本

[PDF] arxiv.org

Augmented language models: a survey

G Mialon, R Dessì, M Lomeli, C Nalmpantis… - arXiv preprint arXiv …, 2023 - arxiv.org

This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …

被引用次数：381 相关文章所有 3 个版本

[PDF] neurips.cc

Imagereward: Learning and evaluating human preferences for text-to-image generation

J Xu, X Liu, Y Wu, Y Tong, Q Li… - Advances in …, 2024 - proceedings.neurips.cc

We present a comprehensive solution to learn and improve text-to-image models from
human preference feedback. To begin with, we build ImageReward---the first general …

被引用次数：195 相关文章所有 6 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：274 相关文章所有 6 个版本

[PDF] mlr.press

Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

B Zhu, M Jordan, J Jiao - International Conference on …, 2023 - proceedings.mlr.press

We provide a theoretical framework for Reinforcement Learning with Human Feedback
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …

被引用次数：121 相关文章所有 8 个版本

[PDF] aaai.org

Preference ranking optimization for human alignment

F Song, B Yu, M Li, H Yu, F Huang, Y Li… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Large language models (LLMs) often contain misleading content, emphasizing the need to
align them with human values to ensure secure AI systems. Reinforcement learning from …

被引用次数：111 相关文章所有 4 个版本

[PDF] arxiv.org

Aligning text-to-image models using human feedback

K Lee, H Liu, M Ryu, O Watkins, Y Du… - arXiv preprint arXiv …, 2023 - arxiv.org

Deep generative models have shown impressive results in text-to-image synthesis.
However, current text-to-image models often generate images that are inadequately aligned …

被引用次数：146 相关文章所有 2 个版本

[PDF] openreview.net

What matters in learning from offline human demonstrations for robot manipulation

A Mandlekar, D Xu, J Wong, S Nasiriany… - arXiv preprint arXiv …, 2021 - arxiv.org

Imitating human demonstrations is a promising approach to endow robots with various
manipulation capabilities. While recent advances have been made in imitation learning and …

被引用次数：303 相关文章所有 4 个版本

[PDF] jmlr.org

Curriculum learning for reinforcement learning domains: A framework and survey

S Narvekar, B Peng, M Leonetti, J Sinapov… - Journal of Machine …, 2020 - jmlr.org

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks
in which the agent has only limited environmental feedback. Despite many advances over …

被引用次数：497 相关文章所有 11 个版本