Does string-based neural MT learn source syntax?

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org

Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

被引用次数：226 相关文章所有 5 个版本

[PDF] mit.edu

Probing classifiers: Promises, shortcomings, and advances

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu

Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

被引用次数：353 相关文章所有 8 个版本

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

被引用次数：771 相关文章所有 9 个版本

[PDF] nature.com

Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations

P Das, T Sercu, K Wadhawan, I Padhi… - Nature Biomedical …, 2021 - nature.com

The de novo design of antimicrobial therapeutics involves the exploration of a vast chemical
repertoire to find compounds with broad-spectrum potency and low toxicity. Here, we report …

被引用次数：268 相关文章所有 6 个版本

[PDF] mit.edu

How can we know what language models know?

Z Jiang, FF Xu, J Araki, G Neubig - Transactions of the Association for …, 2020 - direct.mit.edu

Recent work has presented intriguing results examining the knowledge contained in
language models (LMs) by having the LM fill in the blanks of prompts such as “Obama is a …

被引用次数：1297 相关文章所有 17 个版本

[PDF] pkwyx.com

[PDF][PDF] What Does Bert Look At? An Analysis of Bert's Attention

K Clark - arXiv preprint arXiv:1906.04341, 2019 - fq.pkwyx.com

Large pre-trained neural networks such as BERT have had great recent success in NLP,
motivating a growing body of research investigating what aspects of language they are able …

被引用次数：1773 相关文章

[PDF] pkwyx.com

[PDF][PDF] BERT rediscovers the classical NLP pipeline

I Tenney - arXiv preprint arXiv:1905.05950, 2019 - fq.pkwyx.com

Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We
focus on one such model, BERT, and aim to quantify where linguistic information is captured …

被引用次数：1680 相关文章

[PDF] aclanthology.org

Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned

E Voita, D Talbot, F Moiseev, R Sennrich… - arXiv preprint arXiv …, 2019 - arxiv.org

Multi-head self-attention is a key component of the Transformer, a state-of-the-art
architecture for neural machine translation. In this work we evaluate the contribution made …

被引用次数：1199 相关文章所有 10 个版本

[PDF] arxiv.org

Linguistic knowledge and transferability of contextual representations

NF Liu, M Gardner, Y Belinkov, ME Peters… - arXiv preprint arXiv …, 2019 - arxiv.org

Contextual word representations derived from large-scale neural language models are
successful across a diverse set of NLP tasks, suggesting that they encode useful and …

被引用次数：808 相关文章所有 11 个版本

[PDF] arxiv.org

Designing and interpreting probes with control tasks

J Hewitt, P Liang - arXiv preprint arXiv:1909.03368, 2019 - arxiv.org

Probes, supervised models trained to predict properties (like parts-of-speech) from
representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But …

被引用次数：511 相关文章所有 9 个版本