What you can cram into a single vector: Probing sentence embeddings for linguistic properties

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org

Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

被引用次数：211 相关文章所有 5 个版本

[PDF] mit.edu

Probing classifiers: Promises, shortcomings, and advances

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu

Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

被引用次数：326 相关文章所有 8 个版本

[PDF] hal.science

Bloom: A 176b-parameter open-access multilingual language model

T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow… - 2023 - inria.hal.science

Large language models (LLMs) have been shown to be able to perform new tasks based on
a few demonstrations or natural language instructions. While these capabilities have led to …

被引用次数：1412 相关文章所有 16 个版本

[PDF] neurips.cc

Locating and editing factual associations in GPT

K Meng, D Bau, A Andonian… - Advances in Neural …, 2022 - proceedings.neurips.cc

We analyze the storage and recall of factual associations in autoregressive transformer
language models, finding evidence that these associations correspond to localized, directly …

被引用次数：655 相关文章所有 7 个版本

[PDF] neurips.cc

Do vision transformers see like convolutional neural networks?

M Raghu, T Unterthiner, S Kornblith… - Advances in neural …, 2021 - proceedings.neurips.cc

Convolutional neural networks (CNNs) have so far been the de-facto model for visual data.
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or …

被引用次数：948 相关文章所有 8 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3507 相关文章所有 2 个版本

[PDF] arxiv.org

Fast model editing at scale

E Mitchell, C Lin, A Bosselut, C Finn… - arXiv preprint arXiv …, 2021 - arxiv.org

While large pre-trained models have enabled impressive results on a variety of downstream
tasks, the largest existing models still make errors, and even accurate predictions may …

被引用次数：330 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

被引用次数：700 相关文章所有 9 个版本

[PDF] neurips.cc

Lasuie: Unifying information extraction with latent adaptive structure-aware generative language model

H Fei, S Wu, J Li, B Li, F Li, L Qin… - Advances in …, 2022 - proceedings.neurips.cc

Universally modeling all typical information extraction tasks (UIE) with one generative
language model (GLM) has revealed great potential by the latest study, where various IE …

被引用次数：97 相关文章所有 5 个版本

[PDF] arxiv.org

Autoprompt: Eliciting knowledge from language models with automatically generated prompts

T Shin, Y Razeghi, RL Logan IV, E Wallace… - arXiv preprint arXiv …, 2020 - arxiv.org

The remarkable success of pretrained language models has motivated the study of what
kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the …

被引用次数：1563 相关文章所有 8 个版本