Self-supervised representation learning: Introduction, advances, and challenges

L Ericsson, H Gouk, CC Loy… - IEEE Signal Processing …, 2022 - ieeexplore.ieee.org
Self-supervised representation learning (SSRL) methods aim to provide powerful, deep
feature learning without the requirement of large annotated data sets, thus alleviating the …

Glm: General language model pretraining with autoregressive blank infilling

Z Du, Y Qian, X Liu, M Ding, J Qiu, Z Yang… - arXiv preprint arXiv …, 2021 - arxiv.org
There have been various types of pretraining architectures including autoencoding models
(eg, BERT), autoregressive models (eg, GPT), and encoder-decoder models (eg, T5) …

[HTML][HTML] GPT understands, too

X Liu, Y Zheng, Z Du, M Ding, Y Qian, Z Yang, J Tang - AI Open, 2023 - Elsevier
Prompting a pretrained language model with natural language patterns has been proved
effective for natural language understanding (NLU). However, our preliminary study reveals …

An introduction to deep learning in natural language processing: Models, techniques, and tools

I Lauriola, A Lavelli, F Aiolli - Neurocomputing, 2022 - Elsevier
Abstract Natural Language Processing (NLP) is a branch of artificial intelligence that
involves the design and implementation of systems and algorithms able to interact through …

ProteinBERT: a universal deep-learning model of protein sequence and function

N Brandes, D Ofer, Y Peleg, N Rappoport… - …, 2022 - academic.oup.com
Self-supervised deep language modeling has shown unprecedented success across natural
language tasks, and has recently been repurposed to biological sequences. However …

A robustly optimized BERT pre-training approach with post-training

Z Liu, W Lin, Y Shi, J Zhao - China National Conference on Chinese …, 2021 - Springer
In the paper, we present a ''+''+''three-stage paradigm, which is a supplementary framework
for the standard ''+''language model approach. Furthermore, based on three-stage paradigm …

Retentive network: A successor to transformer for large language models

Y Sun, L Dong, S Huang, S Ma, Y Xia, J Xue… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large
language models, simultaneously achieving training parallelism, low-cost inference, and …

Calibrate before use: Improving few-shot performance of language models

Z Zhao, E Wallace, S Feng, D Klein… - … on machine learning, 2021 - proceedings.mlr.press
GPT-3 can perform numerous tasks when provided a natural language prompt that contains
a few training examples. We show that this type of few-shot learning can be unstable: the …

Benchmarking large language models in retrieval-augmented generation

J Chen, H Lin, X Han, L Sun - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the
hallucination of large language models (LLMs). However, existing research lacks rigorous …

Measuring mathematical problem solving with the math dataset

D Hendrycks, C Burns, S Kadavath, A Arora… - arXiv preprint arXiv …, 2021 - arxiv.org
Many intellectual endeavors require mathematical problem solving, but this skill remains
beyond the capabilities of computers. To measure this ability in machine learning models …