Superglue: A stickier benchmark for general-purpose language understanding systems

I Yamada, A Asai, H Shindo, H Takeda… - arXiv preprint arXiv …, 2020 - arxiv.org

Entity representations are useful in natural language tasks involving entities. In this paper,
we propose new pretrained contextualized representations of words and entities based on …

被引用次数：701 相关文章所有 6 个版本

[PDF] arxiv.org

It's not just size that matters: Small language models are also few-shot learners

T Schick, H Schütze - arXiv preprint arXiv:2009.07118, 2020 - arxiv.org

When scaled to hundreds of billions of parameters, pretrained language models such as
GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous …

被引用次数：826 相关文章所有 5 个版本

[PDF] arxiv.org

Domain-specific language model pretraining for biomedical natural language processing

Y Gu, R Tinn, H Cheng, M Lucas, N Usuyama… - ACM Transactions on …, 2021 - dl.acm.org

Pretraining large neural language models, such as BERT, has led to impressive gains on
many natural language processing (NLP) tasks. However, most pretraining efforts focus on …

被引用次数：1618 相关文章所有 5 个版本

[PDF] neurips.cc

M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models

W Zhang, M Aljunied, C Gao… - Advances in Neural …, 2023 - proceedings.neurips.cc

Despite the existence of various benchmarks for evaluating natural language processing
models, we argue that human exams are a more suitable means of evaluating general …

被引用次数：54 相关文章所有 5 个版本

[PDF] arxiv.org

Pre-trained language models for text generation: A survey

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org

Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

被引用次数：257 相关文章所有 7 个版本

[PDF] arxiv.org

Language models of code are few-shot commonsense learners

A Madaan, S Zhou, U Alon, Y Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

We address the general task of structured commonsense reasoning: given a natural
language input, the goal is to generate a graph such as an event--or a reasoning-graph. To …

被引用次数：123 相关文章所有 3 个版本

[PDF] arxiv.org

Spot: Better frozen model adaptation through soft prompt transfer

T Vu, B Lester, N Constant, R Al-Rfou, D Cer - arXiv preprint arXiv …, 2021 - arxiv.org

There has been growing interest in parameter-efficient methods to apply pre-trained
language models to downstream tasks. Building on the Prompt Tuning approach of Lester et …

被引用次数：223 相关文章所有 12 个版本

[PDF] arxiv.org

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arXiv preprint arXiv …, 2021 - arxiv.org

We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

被引用次数：324 相关文章所有 9 个版本

[PDF] neurips.cc

Benchmarking foundation models with language-model-as-an-examiner

Y Bai, J Ying, Y Cao, X Lv, Y He… - Advances in …, 2024 - proceedings.neurips.cc

Numerous benchmarks have been established to assess the performance of foundation
models on open-ended question answering, which serves as a comprehensive test of a …

被引用次数：58 相关文章所有 6 个版本

[PDF] arxiv.org

Deberta: Decoding-enhanced bert with disentangled attention

P He, X Liu, J Gao, W Chen - arXiv preprint arXiv:2006.03654, 2020 - arxiv.org

Recent progress in pre-trained neural language models has significantly improved the
performance of many natural language processing (NLP) tasks. In this paper we propose a …

被引用次数：2195 相关文章所有 4 个版本