An actor-critic algorithm for sequence prediction

W Yang, H Du, ZQ Liew, WYB Lim… - … Surveys & Tutorials, 2022 - ieeexplore.ieee.org

With the increasing demand for intelligent services, the sixth-generation (6G) wireless
networks will shift from a traditional architecture that focuses solely on a high transmission …

被引用次数：308 相关文章所有 4 个版本

[PDF] researchgate.net

Artificial intelligence and internet of things in small and medium-sized enterprises: A survey

EB Hansen, S Bøgh - Journal of Manufacturing Systems, 2021 - Elsevier

Internet of things (IoT) and artificial intelligence (AI) are popular topics of Industry 4.0. Many
publications regarding these topics have been published, but they are primarily focused on …

被引用次数：404 相关文章所有 5 个版本

[PDF] neurips.cc

Imagereward: Learning and evaluating human preferences for text-to-image generation

J Xu, X Liu, Y Wu, Y Tong, Q Li… - Advances in …, 2024 - proceedings.neurips.cc

We present a comprehensive solution to learn and improve text-to-image models from
human preference feedback. To begin with, we build ImageReward---the first general …

被引用次数：357 相关文章所有 6 个版本

[PDF] neurips.cc

Coderl: Mastering code generation through pretrained models and deep reinforcement learning

H Le, Y Wang, AD Gotmare… - Advances in Neural …, 2022 - proceedings.neurips.cc

Program synthesis or code generation aims to generate a program that satisfies a problem
specification. Recent approaches using large-scale pretrained language models (LMs) have …

被引用次数：318 相关文章所有 7 个版本

[PDF] neurips.cc

Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc

Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

被引用次数：11140 相关文章所有 18 个版本

[PDF] arxiv.org

Aligning text-to-image models using human feedback

K Lee, H Liu, M Ryu, O Watkins, Y Du… - arXiv preprint arXiv …, 2023 - arxiv.org

Deep generative models have shown impressive results in text-to-image synthesis.
However, current text-to-image models often generate images that are inadequately aligned …

被引用次数：218 相关文章所有 2 个版本

[PDF] arxiv.org

BRIO: Bringing order to abstractive summarization

Y Liu, P Liu, D Radev, G Neubig - arXiv preprint arXiv:2203.16804, 2022 - arxiv.org

Abstractive summarization models are commonly trained using maximum likelihood
estimation, which assumes a deterministic (one-point) target distribution in which an ideal …

被引用次数：296 相关文章所有 6 个版本

[PDF] neurips.cc

Learning to summarize with human feedback

N Stiennon, L Ouyang, J Wu… - Advances in …, 2020 - proceedings.neurips.cc

As language models become more powerful, training and evaluation are increasingly
bottlenecked by the data and metrics used for a particular task. For example, summarization …

被引用次数：1784 相关文章所有 10 个版本

[HTML] nih.gov

Transfer learning in deep reinforcement learning: A survey

Z Zhu, K Lin, AK Jain, J Zhou - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org

Reinforcement learning is a learning paradigm for solving sequential decision-making
problems. Recent years have witnessed remarkable progress in reinforcement learning …

被引用次数：713 相关文章所有 12 个版本

[PDF] arxiv.org

Recursively summarizing books with human feedback

J Wu, L Ouyang, DM Ziegler, N Stiennon… - arXiv preprint arXiv …, 2021 - arxiv.org

A major challenge for scaling machine learning is training models to perform tasks that are
very difficult or time-consuming for humans to evaluate. We present progress on this …

被引用次数：253 相关文章所有 2 个版本