- 学术资源搜索

Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：130 相关文章所有 3 个版本

[PDF] arxiv.org

A review on language models as knowledge bases

B AlKhamissi, M Li, A Celikyilmaz, M Diab… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, there has been a surge of interest in the NLP community on the use of pretrained
Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs …

被引用次数：131 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2275 相关文章所有 4 个版本

[PDF] arxiv.org

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

被引用次数：971 相关文章所有 11 个版本

[PDF] arxiv.org

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：212 相关文章所有 7 个版本

[PDF] arxiv.org

Mass-editing memory in a transformer

K Meng, AS Sharma, A Andonian, Y Belinkov… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent work has shown exciting promise in updating large language models with new
memories, so as to replace obsolete information or add specialized knowledge. However …

被引用次数：308 相关文章所有 5 个版本

[PDF] neurips.cc

Towards automated circuit discovery for mechanistic interpretability

A Conmy, A Mavor-Parker, A Lynch… - Advances in …, 2023 - proceedings.neurips.cc

Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …

被引用次数：128 相关文章所有 6 个版本

[PDF] neurips.cc

Locating and editing factual associations in GPT

K Meng, D Bau, A Andonian… - Advances in Neural …, 2022 - proceedings.neurips.cc

We analyze the storage and recall of factual associations in autoregressive transformer
language models, finding evidence that these associations correspond to localized, directly …

被引用次数：690 相关文章所有 7 个版本

[PDF] acm.org

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

被引用次数：227 相关文章所有 5 个版本

[PDF] arxiv.org

Editing large language models: Problems, methods, and opportunities

Y Yao, P Wang, B Tian, S Cheng, Z Li, S Deng… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite the ability to train capable LLMs, the methodology for maintaining their relevancy
and rectifying errors remains elusive. To this end, the past few years have witnessed a surge …

被引用次数：157 相关文章所有 7 个版本