Analyzing transformers in embedding space

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：144 相关文章所有 3 个版本

[PDF] mit.edu

Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu

Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

被引用次数：67 相关文章所有 7 个版本

[PDF] arxiv.org

Mass-editing memory in a transformer

K Meng, AS Sharma, A Andonian, Y Belinkov… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent work has shown exciting promise in updating large language models with new
memories, so as to replace obsolete information or add specialized knowledge. However …

被引用次数：337 相关文章所有 5 个版本

[PDF] arxiv.org

Interpretability in the wild: a circuit for indirect object identification in gpt-2 small

K Wang, A Variengien, A Conmy, B Shlegeris… - arXiv preprint arXiv …, 2022 - arxiv.org

Research in mechanistic interpretability seeks to explain behaviors of machine learning
models in terms of their internal components. However, most previous work either focuses …

被引用次数：286 相关文章所有 4 个版本

[PDF] arxiv.org

Dissecting recall of factual associations in auto-regressive language models

M Geva, J Bastings, K Filippova… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformer-based language models (LMs) are known to capture factual knowledge in their
parameters. While previous work looked into where factual associations are stored, only little …

被引用次数：130 相关文章所有 4 个版本

[PDF] arxiv.org

Eliciting latent predictions from transformers with the tuned lens

N Belrose, Z Furman, L Smith, D Halawi… - arXiv preprint arXiv …, 2023 - arxiv.org

We analyze transformers from the perspective of iterative inference, seeking to understand
how model predictions are refined layer by layer. To do so, we train an affine probe for each …

被引用次数：101 相关文章所有 2 个版本

[PDF] neurips.cc

Birth of a transformer: A memory viewpoint

A Bietti, V Cabannes, D Bouchacourt… - Advances in …, 2024 - proceedings.neurips.cc

Large language models based on transformers have achieved great empirical successes.
However, as they are deployed more widely, there is a growing need to better understand …

被引用次数：48 相关文章所有 7 个版本

[PDF] arxiv.org

Finding neurons in a haystack: Case studies with sparse probing

W Gurnee, N Nanda, M Pauly, K Harvey… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite rapid adoption and deployment of large language models (LLMs), the internal
computations of these models remain opaque and poorly understood. In this work, we seek …

被引用次数：86 相关文章所有 3 个版本

[PDF] ieee.org

A review on large Language Models: Architectures, applications, taxonomies, open issues and challenges

MAK Raiaan, MSH Mukta, K Fatema, NM Fahad… - IEEE …, 2024 - ieeexplore.ieee.org

Large Language Models (LLMs) recently demonstrated extraordinary capability in various
natural language processing (NLP) tasks including language translation, text generation …

被引用次数：120 相关文章所有 7 个版本

[PDF] arxiv.org

Function vectors in large language models

E Todd, ML Li, AS Sharma, A Mueller… - arXiv preprint arXiv …, 2023 - arxiv.org

We report the presence of a simple neural mechanism that represents an input-output
function as a vector within autoregressive transformer language models (LMs). Using causal …

被引用次数：64 相关文章所有 4 个版本