A general language assistant as a laboratory for alignment

A Askell, Y Bai, A Chen, D Drain, D Ganguli… - arXiv preprint arXiv …, 2021 - arxiv.org
Given the broad capabilities of large language models, it should be possible to work towards
a general-purpose, text-based assistant that is aligned with human values, meaning that it is
helpful, honest, and harmless. As an initial foray in this direction we study simple baseline
techniques and evaluations, such as prompting. We find that the benefits from modest
interventions increase with model size, generalize to a variety of alignment evaluations, and
do not compromise the performance of large models. Next we investigate scaling trends for …

[引用][C] A general language assistant as a laboratory for alignment. arXiv

A Askell, Y Bai, A Chen, D Drain, D Ganguli… - Preprint posted online …, 2021
以上显示的是最相近的搜索结果。 查看全部搜索结果