A general language assistant as a laboratory for alignment
Given the broad capabilities of large language models, it should be possible to work towards
a general-purpose, text-based assistant that is aligned with human values, meaning that it is
helpful, honest, and harmless. As an initial foray in this direction we study simple baseline
techniques and evaluations, such as prompting. We find that the benefits from modest
interventions increase with model size, generalize to a variety of alignment evaluations, and
do not compromise the performance of large models. Next we investigate scaling trends for …
a general-purpose, text-based assistant that is aligned with human values, meaning that it is
helpful, honest, and harmless. As an initial foray in this direction we study simple baseline
techniques and evaluations, such as prompting. We find that the benefits from modest
interventions increase with model size, generalize to a variety of alignment evaluations, and
do not compromise the performance of large models. Next we investigate scaling trends for …
以上显示的是最相近的搜索结果。 查看全部搜索结果