LUKE: Deep contextualized entity representations with entity-aware self-attention

I Yamada, A Asai, H Shindo, H Takeda… - arXiv preprint arXiv …, 2020 - arxiv.org
Entity representations are useful in natural language tasks involving entities. In this paper,
we propose new pretrained contextualized representations of words and entities based on …

It's not just size that matters: Small language models are also few-shot learners

T Schick, H Schütze - arXiv preprint arXiv:2009.07118, 2020 - arxiv.org
When scaled to hundreds of billions of parameters, pretrained language models such as
GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous …

Domain-specific language model pretraining for biomedical natural language processing

Y Gu, R Tinn, H Cheng, M Lucas, N Usuyama… - ACM Transactions on …, 2021 - dl.acm.org
Pretraining large neural language models, such as BERT, has led to impressive gains on
many natural language processing (NLP) tasks. However, most pretraining efforts focus on …

M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models

W Zhang, M Aljunied, C Gao… - Advances in Neural …, 2023 - proceedings.neurips.cc
Despite the existence of various benchmarks for evaluating natural language processing
models, we argue that human exams are a more suitable means of evaluating general …

Pre-trained language models for text generation: A survey

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

Language models of code are few-shot commonsense learners

A Madaan, S Zhou, U Alon, Y Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
We address the general task of structured commonsense reasoning: given a natural
language input, the goal is to generate a graph such as an event--or a reasoning-graph. To …

Spot: Better frozen model adaptation through soft prompt transfer

T Vu, B Lester, N Constant, R Al-Rfou, D Cer - arXiv preprint arXiv …, 2021 - arxiv.org
There has been growing interest in parameter-efficient methods to apply pre-trained
language models to downstream tasks. Building on the Prompt Tuning approach of Lester et …

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Benchmarking foundation models with language-model-as-an-examiner

Y Bai, J Ying, Y Cao, X Lv, Y He… - Advances in …, 2024 - proceedings.neurips.cc
Numerous benchmarks have been established to assess the performance of foundation
models on open-ended question answering, which serves as a comprehensive test of a …

Deberta: Decoding-enhanced bert with disentangled attention

P He, X Liu, J Gao, W Chen - arXiv preprint arXiv:2006.03654, 2020 - arxiv.org
Recent progress in pre-trained neural language models has significantly improved the
performance of many natural language processing (NLP) tasks. In this paper we propose a …