Superglue: A stickier benchmark for general-purpose language understanding systems

L Ericsson, H Gouk, CC Loy… - IEEE Signal Processing …, 2022 - ieeexplore.ieee.org

Self-supervised representation learning (SSRL) methods aim to provide powerful, deep
feature learning without the requirement of large annotated data sets, thus alleviating the …

被引用次数：273 相关文章所有 6 个版本

[PDF] arxiv.org

Glm: General language model pretraining with autoregressive blank infilling

Z Du, Y Qian, X Liu, M Ding, J Qiu, Z Yang… - arXiv preprint arXiv …, 2021 - arxiv.org

There have been various types of pretraining architectures including autoencoding models
(eg, BERT), autoregressive models (eg, GPT), and encoder-decoder models (eg, T5) …

被引用次数：889 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] GPT understands, too

X Liu, Y Zheng, Z Du, M Ding, Y Qian, Z Yang, J Tang - AI Open, 2023 - Elsevier

Prompting a pretrained language model with natural language patterns has been proved
effective for natural language understanding (NLU). However, our preliminary study reveals …

被引用次数：897 相关文章所有 5 个版本

An introduction to deep learning in natural language processing: Models, techniques, and tools

I Lauriola, A Lavelli, F Aiolli - Neurocomputing, 2022 - Elsevier

Abstract Natural Language Processing (NLP) is a branch of artificial intelligence that
involves the design and implementation of systems and algorithms able to interact through …

被引用次数：375 相关文章所有 4 个版本

[PDF] oup.com

ProteinBERT: a universal deep-learning model of protein sequence and function

N Brandes, D Ofer, Y Peleg, N Rappoport… - …, 2022 - academic.oup.com

Self-supervised deep language modeling has shown unprecedented success across natural
language tasks, and has recently been repurposed to biological sequences. However …

被引用次数：406 相关文章所有 15 个版本

[PDF] aclanthology.org

A robustly optimized BERT pre-training approach with post-training

Z Liu, W Lin, Y Shi, J Zhao - China National Conference on Chinese …, 2021 - Springer

In the paper, we present a ''+''+''three-stage paradigm, which is a supplementary framework
for the standard ''+''language model approach. Furthermore, based on three-stage paradigm …

被引用次数：415 相关文章所有 4 个版本

[PDF] arxiv.org

Retentive network: A successor to transformer for large language models

Y Sun, L Dong, S Huang, S Ma, Y Xia, J Xue… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large
language models, simultaneously achieving training parallelism, low-cost inference, and …

被引用次数：142 相关文章所有 3 个版本

[PDF] mlr.press

Calibrate before use: Improving few-shot performance of language models

Z Zhao, E Wallace, S Feng, D Klein… - … on machine learning, 2021 - proceedings.mlr.press

GPT-3 can perform numerous tasks when provided a natural language prompt that contains
a few training examples. We show that this type of few-shot learning can be unstable: the …

被引用次数：973 相关文章所有 5 个版本

[PDF] aaai.org

Benchmarking large language models in retrieval-augmented generation

J Chen, H Lin, X Han, L Sun - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the
hallucination of large language models (LLMs). However, existing research lacks rigorous …

被引用次数：97 相关文章所有 3 个版本

[PDF] arxiv.org

Measuring mathematical problem solving with the math dataset

D Hendrycks, C Burns, S Kadavath, A Arora… - arXiv preprint arXiv …, 2021 - arxiv.org

Many intellectual endeavors require mathematical problem solving, but this skill remains
beyond the capabilities of computers. To measure this ability in machine learning models …

被引用次数：653 相关文章所有 6 个版本