Evaluating layers of representation in neural machine translation on part-of-speech and semantic...

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu

Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

被引用次数：432 相关文章所有 8 个版本

[PDF] mit.edu

Analysis methods in neural language processing: A survey

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu

The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

被引用次数：628 相关文章所有 14 个版本

[PDF] acm.org

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

被引用次数：383 相关文章所有 5 个版本

[PDF] aclanthology.org

Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned

E Voita, D Talbot, F Moiseev, R Sennrich… - arXiv preprint arXiv …, 2019 - arxiv.org

Multi-head self-attention is a key component of the Transformer, a state-of-the-art
architecture for neural machine translation. In this work we evaluate the contribution made …

被引用次数：1316 相关文章所有 10 个版本

[PDF] arxiv.org

Designing and interpreting probes with control tasks

J Hewitt, P Liang - arXiv preprint arXiv:1909.03368, 2019 - arxiv.org

Probes, supervised models trained to predict properties (like parts-of-speech) from
representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But …

被引用次数：554 相关文章所有 9 个版本

[PDF] aclanthology.org

What you can cram into a single vector: Probing sentence embeddings for linguistic properties

A Conneau, G Kruszewski, G Lample, L Barrault… - arXiv preprint arXiv …, 2018 - arxiv.org

Although much effort has recently been devoted to training high-quality sentence
embeddings, we still have a poor understanding of what they are capturing." Downstream" …

被引用次数：1041 相关文章所有 8 个版本

[PDF] jair.org

Compositionality decomposed: How do neural networks generalise?

D Hupkes, V Dankers, M Mul, E Bruni - Journal of Artificial Intelligence …, 2020 - jair.org

Despite a multitude of empirical studies, little consensus exists on whether neural networks
are able to generalise compositionally, a controversy that, in part, stems from a lack of …

被引用次数：350 相关文章所有 10 个版本

[PDF] arxiv.org

A survey on semantic processing techniques

R Mao, K He, X Zhang, G Chen, J Ni, Z Yang… - Information …, 2024 - Elsevier

Semantic processing is a fundamental research domain in computational linguistics. In the
era of powerful pre-trained language models and large language models, the advancement …

被引用次数：37 相关文章所有 11 个版本

[PDF] neurips.cc

The geometry of hidden representations of large transformer models

L Valeriani, D Doimo, F Cuturello… - Advances in …, 2023 - proceedings.neurips.cc

Large transformers are powerful architectures used for self-supervised data analysis across
various data types, including protein sequences, images, and text. In these models, the …

被引用次数：40 相关文章所有 5 个版本

[PDF] arxiv.org

The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives

E Voita, R Sennrich, I Titov - arXiv preprint arXiv:1909.01380, 2019 - arxiv.org

We seek to understand how the representations of individual tokens and the structure of the
learned feature space evolve between layers in deep neural networks under different …

被引用次数：185 相关文章所有 10 个版本