Post-hoc interpretability for neural nlp: A survey

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org
Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

Probing classifiers: Promises, shortcomings, and advances

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu
Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations

P Das, T Sercu, K Wadhawan, I Padhi… - Nature Biomedical …, 2021 - nature.com
The de novo design of antimicrobial therapeutics involves the exploration of a vast chemical
repertoire to find compounds with broad-spectrum potency and low toxicity. Here, we report …

How can we know what language models know?

Z Jiang, FF Xu, J Araki, G Neubig - Transactions of the Association for …, 2020 - direct.mit.edu
Recent work has presented intriguing results examining the knowledge contained in
language models (LMs) by having the LM fill in the blanks of prompts such as “Obama is a …

[PDF][PDF] What Does Bert Look At? An Analysis of Bert's Attention

K Clark - arXiv preprint arXiv:1906.04341, 2019 - fq.pkwyx.com
Large pre-trained neural networks such as BERT have had great recent success in NLP,
motivating a growing body of research investigating what aspects of language they are able …

[PDF][PDF] BERT rediscovers the classical NLP pipeline

I Tenney - arXiv preprint arXiv:1905.05950, 2019 - fq.pkwyx.com
Pre-trained text encoders have rapidly advanced the state of the art on many NLP tasks. We
focus on one such model, BERT, and aim to quantify where linguistic information is captured …

Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned

E Voita, D Talbot, F Moiseev, R Sennrich… - arXiv preprint arXiv …, 2019 - arxiv.org
Multi-head self-attention is a key component of the Transformer, a state-of-the-art
architecture for neural machine translation. In this work we evaluate the contribution made …

Linguistic knowledge and transferability of contextual representations

NF Liu, M Gardner, Y Belinkov, ME Peters… - arXiv preprint arXiv …, 2019 - arxiv.org
Contextual word representations derived from large-scale neural language models are
successful across a diverse set of NLP tasks, suggesting that they encode useful and …

Designing and interpreting probes with control tasks

J Hewitt, P Liang - arXiv preprint arXiv:1909.03368, 2019 - arxiv.org
Probes, supervised models trained to predict properties (like parts-of-speech) from
representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But …